Last reviewed 01/2018

The correlation of data is a measure of how much one variable is affected by another. Thus, if for a given variable x the variable y can be predicted in some sort of straight line relationship, then they may be said to be correlated.

For normally distributed data Pearson's correlation coefficient, r, is used, which has a value of -1 to 1. Zero implies no correlation; a negative number implies that increasing x decreases y. The closer to 1 (or -1) the better the correlation.

Note that for a given line equation, y=mx + c, where m is the gradient, and c the y axis intercept, the correlation coefficent is not the same as the gradient.

For data that is not normally distributed Spearmans or Kendalls correlations are used.