|
If you flip a coin 100 times and it always comes out heads, you start
to feel that maybe the coin is biased, that is it's more likely to come
out heads than tails, on average. If 51 times it's heads, but 49 times its
tails, you'd think that it's probably unbiased. But how do you quantify this
intuition?
If it was heads 100 times in a row you'd say: "come on, I kept on
flipping the darned coin and it was always heads. What are the odds of
that?
. So it's got to be a biased coin".
But if it was 51 heads versus 49 tails you'd say: "it's probably unbiased,
this is the kind of thing you'd see by chance. If you repeat it again
you might see 46 heads and 54 tails."
With this in mind, we'll try to quantify what you do.
- 1. Form Hypotheses
- Yes that's right use a fancy word like that and everyone will listen
to you. When you've got their attention, say "Hypothesis, which is from
the Greek hupotithenai, look at me ain't I clever". After you've
learned to pronounce it correctly you've actually got to come up with
hypotheses, like: "this is an unbiased coin". Another hypothesis is:
"this is a biased coin". wow.
Another example of this kind of hypothesis could be : "the heights of
a population of men and women are the same". An alternative hypothesis is :
"the heights of a population of men and women are different".
When a hypothesis says, "you're not going to see any difference" like
above, it's often referred to as "the null hypothesis". Here's another
example of an : "Echinacea does zip for colds".
When a hypothesis is an alternative to this like above, this is
often called "the alternative hypothesis". Fancy terminology for something
seemingly pretty simple.
- 2. Form a statistic
- You've also got to decide on
a statistic to measure, like the fraction of times you get heads, or maybe
something more complicated than that, like the difference in the
means of the heights of men and women.
- 3. Do the experiment
- How can you determine if the hypothesis is correct? You've got to do a real
experiment. We're not Einstein here, figuring out if the coin is biased
just by solving the equation for the universe. We need to get real data
drawn from a sample of our populations.
- 4. Likelihood of result
- Now call the statistic you've calculated
. You calculate the probability you'd
get under the assumption of , the null hypothesis. In the case of
coins, getting no heads would be . Under the null hypothesis, that your
coin is not biased, we know the probability of this happening is ,
which even an ant would say is a very small number.
In the height difference example, we measure the difference in heights between
men and women, . Assuming the null hypothesis, what's the probability
that you'd get a difference ? That can be quite a tough question to
answer involving some quite complicated math.
In the case of real statistical measures, when I said "complicated" I'm
not kidding. Some of these formulas are pretty involved. We'll go through
how to estimate this probability a little later.
- 5. The verdict
- So you get some number, like
, or maybe . This is the
likelihood that you'd get your result assuming the null hypothesis ,
for example assuming the coin is unbiased. If it's you can
be pretty sure that is not true. If is , then you can't
reliably reject the null hypothesis. You just throw up your hands,
or if you want to sound more erudite, you say there's no statistically
significant difference between the means of the two populations. Where do
you set the threshold? In scientific work, you set it reasonably low,
but not too low, typically .
Should you reverse items 2 and 3, that is, determine the statistics once
you've seen your data? That's not really a good idea because it messes
up your analysis of statistical significance. You could then peruse some
thick book on statistical tests and likely find some test "Applecore's 5
sided, triple blind, non-parametric, variation of means test" (I just
made that up), that'll give you the desired answer. What do I mean
by desired? Desired by the drug company or the funding agency? That's
why you want to design the experiment and the test before seeing the
actual data. Is what I said always followed in practice? Of course not.
Statisticians call this "the problem of multiplicity".
Josh Deutsch
2009-03-05
| |
|