In class we’ve been discussing Kalman Filters, and how can can be used in a GPS-INS (Intertial Sensor) scheme to gather a more accurate location by using a combination of INS data (as the dynamic model) and GPS data as correction measurements.
The Kalman filter is rooted in Bayesian Inference, and is actually just a special case of Bayesian Inference (assuming that measurements & states are guassian, and that dynamic and measurement models are linear).
I decided to understand Bayesian inference before I sought to understand Kalman filters. It’s been a painful couple days of YT and textbook reading, but I’m starting to get a grasp.
This is a fundamental stats concepts so I won’t spend too much time rehashing it. Essentially Bayes theorum stems from the fact that we can express joint probabilities of two events in two different ways, yet joint probabilities are equal. So: $P(A \ and \ B)=P(B\ |\ A)*P(A) = P(A\ |\ B)*P(B)= P(B\ and \ A)$.
This should make sense. Suppose we are considering the probability I am eating cereal and the probability I am using a spoon to eat.
The probability that I am using a spoon and eating cereal is equal to the probability I am eating cereal and using a spoon.
We can manipulate this equation to find . . . .
$$ P (A|B) = (P(B|A)*P(A))/P(B) $$
This is Bayes rule. Why is it is useful?
Well, it is incredibly useful for updating the our beliefs about the probability of an event (A in this case) given some evidence B.
Let’s play with the numbers for a second. So, given A is true, what is the probability that we see our evidence? If the probability seeing our evidence is high, given that A is true, then the P(A) goes up, this should make sense because in the newly defined probabilistic “space”, a large part still includes event A. However, if everything stays the same, and P(B) increases, then P(A|B) decreases, which should also make sense because the probabilistic space defined by our evidence grows, without the probability of the evidence in A growing, meaning that the “space defined by the evidence is growing in the not A space.
If all of that doesn’t make sense go watch this video by 3 blue 1 brown, which explains it better than I can:
Bayes theorem, the geometry of changing beliefs
Things are going to get considerably more complicated now. Bayesian inference can be used to estimate parameters in linear functions (similar to the least squares approach). I will approach this explanation from that standpoint, specifically for batch linear regression.