Suppose you roll two dice. Their outcomes are independent. You can also say that the outcomes are "uncorrelated". At the other extreme, when you roll two dice, the outcome of the first die landing 6, and the total being 12, are very highly correlated. How do you quantify correlation?
Suppose you have two variables, say and , and they represent the
outcomes of throwing the first and second die respectively. Then you know
, because of
independence. In this case you'd also say their correlation was zero.
That suggests that a good measure for correlation is the covariance:
However this is "unnormalized". You might want it to be 1 if the variables
were completely correlated, for example, . In the , we have
which is the variance
of , . Often we want a "normalized" definition, by dividing
by the appropriate factor we get the "correlation coefficient":
So with to completely uncorrelated variables, you get a correlation 0. If they're perfectly correlated, then you get 1. If they're perfectly anti-correlated, you get -1.
Let's take an example. Take to be the weight of Swedish men, and to be their height. You'd expect there to be a correlation between the two variables, because there aren't going to be many 7 foot tall men weighing 110 lbs. But there are probably some 5 foor 2 inch men weighing that. This can be represented graphically. You take 20 Swedish men and plot weight versus height for each one (this isn't real data)
In this case there is a correlation between the two, but it's far from perfect. You expect your R for this to be somewhat less than 1. But lets now take to be the time since a lawn has been mowed, and to be the height of grass.
In this case the two are very highly correlated, and you expect a correlation coefficient close to 1.
You can see a lot of examples of different correlation coefficients here.
Note that the choice of variables x and y in the above was arbitrary, you could of called the s and t, or elephant and daisy instead.