Skip to content

When do you use a correlation test?

Very often in biology you want to know whether two things are associated — whether they change together. Do taller plants have wider leaves? Does a woodlouse’s activity change with temperature? Does shell height increase as you go further up a rocky shore? A correlation test answers this.

Use a correlation test when…

  • you have pairs of measurements — two values recorded for each individual or site (e.g. for each limpet: its distance up the shore and its shell height);
  • you want to know if the two vary together (as one goes up, does the other go up or down?);
  • you are not comparing two separate groups (that is a t-test) and not counting categories (that is chi-squared).

At A-level the correlation test you use is Spearman’s rank correlation. (Some textbooks describe Pearson’s correlation, but the boards ask for Spearman’s rank — it works on the rank order of the data, which makes it more forgiving and easier to calculate by hand.)

The two variables: which is which?

One variable is usually the independent variable (the one that could affect the other, plotted on the x-axis) and the other the dependent variable (the one that might be affected, on the y-axis). For limpets: distance up the shore is independent (x); shell height is dependent (y). For correlation it does not actually change the result which way round you put them — but plotting it conventionally makes the graph easier to read.

First, always draw a scatter graph

Before any calculation, plot the pairs of data as a scatter graph — each pair becomes one point (its x-value across, its y-value up). This lets you see at a glance whether the two variables are related, and how. There are three patterns to recognise:

02468100246810Variable xVariable yPositive correlation02468100246810Variable xVariable yNegative correlation02468100246810Variable xVariable yNo correlation
Positive correlation: as x increases, y increases (points rise from bottom-left to top-right). Negative correlation: as x increases, y decreases (points fall). No correlation: the points are scattered with no clear pattern.
  • Positive correlation — the two variables increase together (e.g. leaf length and leaf width).
  • Negative correlation — as one increases, the other decreases (e.g. a fox’s oxygen consumption falls as air temperature rises).
  • No correlation — no relationship; the points are scattered all over.

The scatter graph gives you the picture. The correlation coefficient then turns that picture into a single number, so you can say how strong the correlation is and whether it is significant.

The correlation coefficient: one number from −1 to +1

Spearman’s rank correlation gives a single number, called the correlation coefficient, with the symbol rs. It is always somewhere between −1 and +1, and its value tells you two things at once — the direction and the strength of the correlation:

Reading rs

  • rs = +1 — a perfect positive correlation (every point on a rising line).
  • rs close to +1 (e.g. +0.9) — a strong positive correlation.
  • rs = 0no correlation.
  • rs close to −1 (e.g. −0.9) — a strong negative correlation.
  • rs = −1 — a perfect negative correlation (every point on a falling line).

So the sign (+ or −) tells you the direction, and how far the number is from 0 tells you the strength. A value of −0.8 is just as strong a correlation as +0.8 — it is simply negative instead of positive.

The formula (given to you in the exam)

Spearman’s rank works on the rank order of the data, not the raw values. You rank each variable, find the difference in ranks for each pair, and put those into the formula. You are always given it — you just need to know what each symbol means:

Reading the symbols:

  • rsthe Spearman’s rank correlation coefficient — the answer, between −1 and +1.
  • dfor one pair, the difference between its two ranks (rank of x − rank of y).
  • that difference squared (so negative differences do not cancel out positive ones).
  • Σd²the sum of all the squared differences (add up the d² column).
  • nthe number of pairs of data.

In plain steps: (1) rank each variable separately (1 = smallest); (2) for each pair, subtract the ranks to get d; (3) square each d; (4) add up the d² column to get Σd²; (5) put Σd² and n into the formula.

Rank each variable separately, smallest = 1. If two values are equal (a tie), give them the average of the ranks they would have taken (e.g. two values tying for 3rd and 4th both get rank 3.5).

Worked example: limpets on the shore

A full worked example with real-style fieldwork data. Click to expand the method.

Does limpet shell height increase up the shore?

Spearman’s rank · n = 10
Correlationworked

At 10 points up a rocky shore, a student recorded the distance up the shore (m) and the height of a limpet shell (mm) at that point. Is there a significant correlation between distance up the shore and shell height?

Step 1 — State the null hypothesis

Null hypothesis (H₀): there is no correlation between distance up the shore and limpet shell height. (Any correlation we see is assumed to be due to chance until the test shows otherwise.)

Step 2 — The data, ranked

Rank each column separately (1 = smallest). Then find the difference in ranks (d) and square it (d²).

Distance x (m)Shell height y (mm)Rank of xRank of yd = rankX − rankY
0.581100
1.21023−11
2.093211
3.1124400
4.01456−11
5.2136511
6.01678−11
7.1158711
8.3189900
9.019101000
Σd² =6

Plotting the pairs as a scatter graph first shows the pattern clearly — the points rise from bottom-left to top-right, so we expect a strong positive correlation:

681012141618200246810Distance up the shore (m)Limpet shell height (mm)Shell height vs distance up the shore
Each point is one sampling position: its distance up the shore (x) against the limpet shell height there (y). The dashed line is the line of best fit — it rises, confirming a positive correlation.

Step 3 — Put the numbers into the formula

n = 10, so n² − 1 = 100 − 1 = 99, and n(n²−1) = 10 × 99 = 990. Σd² = 6.

rₛ = 1 − (6 × Σd²) ÷ [ n(n² − 1) ] = 1 − (6 × 6) ÷ 990 = 1 − 36 ÷ 990 = 1 − 0.036 = 0.96

Step 4 — Compare with the critical value

From the Spearman’s rank critical-values table, for n = 10 at p = 0.05 the critical value is 0.648. Compare your rs (using how far it is from 0):

rₛ = 0.96 is GREATER than critical value 0.648
Conclusion: because rs (0.96) is greater than the critical value (0.648), the correlation is significant — we reject the null hypothesis. There is a strong, significant positive correlation between distance up the shore and limpet shell height: limpets higher up the shore have taller shells.
Compare rs with the critical value using its distance from 0 (ignore the ± sign) — a value of −0.96 would be just as significant. And a significant correlation does not prove that being higher up the shore causes taller shells — see the next section.

Medical example: is resting heart rate linked to blood pressure?

Spearman’s rank · n = 10 patients
Correlationmedical

A nurse recorded, for 10 adult patients at a clinic, each patient’s resting heart rate (beats per minute) and their systolic blood pressure (mmHg). Is there a significant correlation between the two?

Step 1 — State the null hypothesis

Null hypothesis (H₀): there is no correlation between resting heart rate and systolic blood pressure.

Step 2 — The data, ranked

Rank each column separately (1 = lowest), find the difference in ranks (d) and square it.

Heart rate x (bpm)Blood pressure y (mmHg)Rank of xRank of yd = rankX − rankY
581121100
6111823−11
641163211
661214400
7012456−11
721226511
751297700
781318800
811349900
85140101000
Σd² =4

Plotted as a scatter graph, the points rise together — higher heart rates tend to go with higher blood pressures:

1051101151201251301351401455560657075808590Resting heart rate (beats per minute)Systolic blood pressure (mmHg)Blood pressure vs resting heart rate
Each point is one patient: resting heart rate (x) against systolic blood pressure (y). The rising line of best fit indicates a positive correlation.

Step 3 — Put the numbers into the formula

n = 10, so n(n² − 1) = 10 × 99 = 990. Σd² = 4.

rₛ = 1 − (6 × Σd²) ÷ [ n(n² − 1) ] = 1 − (6 × 4) ÷ 990 = 1 − 24 ÷ 990 = 1 − 0.024 = 0.98

Step 4 — Compare with the critical value

For n = 10 at p = 0.05 the critical value is 0.648.

rₛ = 0.98 is GREATER than critical value 0.648
Conclusion: because rs (0.98) is greater than the critical value (0.648), the correlation is significant — we reject the null hypothesis. There is a strong, significant positive correlation between resting heart rate and systolic blood pressure in these patients.
Crucial medical point: this correlation does not prove that a high heart rate causes high blood pressure. Both can be raised by a third factor — for example body mass, stress, or caffeine intake. To show cause, you would need a controlled trial.

Correlation is not causation

This is the single most important warning about correlation, and a very common exam point. A significant correlation shows two variables are associated — it does NOT prove that one causes the other.

Why not?

  • The relationship might be caused by a third factor affecting both. For example, people’s hand size and foot size are strongly correlated — but big hands do not cause big feet; both are controlled by genes for overall body size.
  • It could be a coincidence, especially with a small sample.
  • Even if there is a real cause, the correlation alone does not tell you which way round it works.

So when you write a conclusion: say there is a significant correlation / association between the variables — do not claim one causes the other unless you have other evidence. To investigate a cause, you would design a controlled experiment.

A classic mark: “the data show a significant positive correlation between X and Y, but this does not prove that X causes Y — there may be a third factor, or it may be coincidence.”

Check your understanding

Self-marking questions, plus a chance to calculate rs yourself from a small data set.

What each exam board expects

All the main A-level Biology specifications name a correlation coefficient; Spearman’s rank is the one to know.

BoardWhat is required
AQA (7402)Select and use a correlation coefficient; AQA recommends Spearman’s rank. Interpret the probability value and reject/accept the null hypothesis at 0.05.
OCR A / BSpearman’s rank correlation named explicitly; rank the data, calculate rs, compare with the critical value.
Edexcel A / BUse a correlation coefficient (Spearman’s rank) to test for an association; understand significance at 5% and that correlation is not causation.
WJEC / EduqasSpearman’s rank correlation coefficient; rank data, use the formula, interpret against tabulated critical values.