|
| (1.58) |
| (1.59) |
![]() |
(1.60) |
![]() |
(1.61) |
![]() |
(1.62) |
However this is "unnormalized". You might want it to be 1 if the variables
were completely correlated, for example,
. In the
, we have
which is the variance
of
,
. Often we want a "normalized" definition, by dividing
by the appropriate factor we get the "correlation coefficient":
So with to completely uncorrelated variables, you get a correlation 0. If they're perfectly correlated, then you get 1. If they're perfectly anti-correlated, you get -1.
Let's take an example. Take
to be the weight of Swedish men, and
to be their height. You'd expect there to be a correlation between
the two variables, because there aren't going to be many 7 foot tall men
weighing 110 lbs. But there are probably some 5 foor 2 inch men weighing
that. This can be represented graphically. You take 20 Swedish men and
plot weight versus height for each one (this isn't real data)
In this case there is a correlation between the two, but it's far
from perfect. You expect your R for this to be somewhat less than
1. But lets now take
to be the time since a lawn has been mowed, and
to be the height of grass.
In this case the two are very highly correlated, and you expect a correlation coefficient close to 1.
You can see a lot of examples of different correlation coefficients here.
Note that the choice of variables x and y in the above was arbitrary, you could of called the s and t, or elephant and daisy instead.
Keywords: