There's no such thing as a brontosaurus
Now we have the mean and the variance, so we know the distribution
right? It's just going to be a Gaussian as in eqn 1.55,
with mean
But wait, we don't really know the variance in that equation, only an estimate of it. And the variance itself has a variance, (strap in brain, infinite recursion looming). For example, suppose you estimate of the variance was too small and the true variance was bigger, then the distribution would be flattened out. So you want to do an average over a bunch of Gaussians: We're not going to do the math to solve this statistical quagmire. It's not that hard, just involving one integration, but getting to that stage might put you to sleep. So instead of having a tail that dies off quickly, the smearing over differentThe good thing is that this problem was solved a long time ago. The name of this distribution is "Student's t-distribution". Strange? When I was a student, I thought it was called this because it was a mickey-mouse test that real men didn't use, only students in chem labs. Nope, it's stranger than that. The guy that figured this out, William Sealy Gosset, was a statistician and a chemist working for Guiness beer (talk about perks of the job) in Dublin. The only downside is that for some reason they didn't want him to publish under his own name; who knows why. So Gosset had to work in the closet. (Har Har). He published under the nom de plume "Student" instead.
The distribution depends on the number of data points you've got, because
the more data, the closer to a Gaussian it'll be. So if
That variable So all that's left is to integrate this distribution to find the area under the curve. Fortunately we have computers to do such things for us (although in principle you could make quite a bit of headway analytically).
As just mentioned, the distribution is a function of two things, So let's summarize what we now need to do and then work through an example.
Josh Deutsch 2009-03-05 |