|
So we see that the distribution for the sum of independent random binary variables
becomes a Gaussian for large . Was this some weird mathematical curiosity
that gets filed away in your head, right next what you had last
year for breakfast? No, it's a lot more general than that. If you take Bob's
blade's of grass, no matter what funky distribution each one takes, the sum of
their lengths, , will also be well approximated by a Gaussian distribution for large .
Under a fairly broad set of conditions, the sum of a large number of independent
random variables will look like a Gaussian distribution. Since the mean
is related to by a scaling factor, one can say the same thing: the
distribution of the average of a large number of independent random variables
is well approximated by a Gaussian distribution. The only thing that you need to
do to get the complete probability distribution is fit the Gaussian with the correct mean and variance.
This is important because the mean is something you calculate all the time
with experimental data. So without knowing much about your data, you know that
this mean will follow a Gaussian distribution. That has a lot of important
consequences for statistics.
If you know the variance of each variable , you can then also obtain the variance
of the sum of of these variables, as we did for the binomial distribution, where it's
just . Because the mean just divides the sum by , you arrive at
the important conclusion
 |
(1.57) |
This means if you toss an unbiased coin 100 times, and you measure the mean proportion
of heads, you'll get with some deviation. To get that deviation, you calculate
the error of the mean. The variance for one flip, as we saw before, was . So
the variance in the mean for 100 flips is . To get the standard deviation,
you take the square root, which gives . So if you do an experiment once
you might get a mean of , or , or . Every time you
do the experiment you get a different result but only different by about a standard
deviation.
This is a simplified discussion of the central limit theorem.
I should also state that even though the Gaussian distribution does well
approximate the correct distribution for a mean (or a sum), there are
circumstances where this difference can be crucial. But we're fortunate
that it's normally easy to figure out when it's not possible to use this
theorem.
josh
2010-10-20
| |
|