The t-test: comparing two means
The Student t-test answers one question: are the means of two sets of measurements really different, or is the difference just natural variation? This page builds it up from scratch, with a full worked example using real mussel shell lengths from a rocky shore.
When do you use a t-test?
You use a t-test when you have two groups of measured (continuous) data and you want to know whether the difference between their means is real or just natural variation. If you have read the descriptive statistics page, this is the exact question raised there: is the gap between two means bigger than variation alone would produce? The t-test is the tool that answers it.
Use a t-test when…
- you are comparing two means (exactly two groups — no more);
- the data are continuous (measured, e.g. length, mass, height) — not counts or categories;
- the data are roughly normally distributed (the bell shape).
If instead you are comparing counts in categories (e.g. how many organisms in each zone), that is a job for the chi-squared test, not the t-test. If you are looking for a relationship between two variables, that is correlation.
The investigation: mussels on a rocky shore
Here is the real fieldwork example we will work through. On a rocky shore, mussels live attached to the rocks in two zones: the high-tide zone near the top of the shore, and the low-tide zone further down. A student wondered whether mussels grow to a different length in the two zones (perhaps because the lower zone is covered by seawater for longer each day, giving more feeding time).
She collected a sample of 15 mussels from each zone and measured the length of every shell in millimetres. The two zones give her two samples:
- Sample A — 15 mussels from the high-tide zone.
- Sample B — 15 mussels from the low-tide zone.
Each mussel gives one length measurement — that single value is what we call x:
The null hypothesis
Every statistical test starts by assuming there is no real effect — this starting assumption is the null hypothesis. Here it is: “There is no difference between the mean shell length of mussels in sample A (high zone) and sample B (low zone).” The t-test will tell us whether the evidence is strong enough to reject this assumption.
The idea behind the t-test
The two samples will almost certainly have different means — but remember from the descriptive statistics page, two samples differ a little even when nothing real is going on, simply because individuals vary. So a difference in means by itself proves nothing. The t-test weighs the difference against the variation, and it does this by comparing two things:
- The size of the difference between the two means — the bigger this is, the more likely it is real. (This is the top of the formula.)
- How much the measurements vary, and how many you took — the more the individual mussels vary (and the fewer you measured), the more the means could differ by variation alone. (This is the bottom of the formula.)
Why you must calculate the standard deviation, not just the means
This is the key point, and it is why the t-test needs the standard deviation (the measure of spread). Look at the two situations below. In both, the two means are the same distance apart — but the conclusion is completely different, purely because of the spread:
The lesson
The difference between the means is not enough on its own. The same difference can be significant (curves separate) or not significant (curves overlap) — it depends entirely on the spread. That is why you must calculate the standard deviation of each sample and feed it into the t-test: it is the standard deviation that tells you whether the curves separate or overlap.
The t value in one sentence
The t value is simply the difference between the two means, measured in units of the variation. A big t means the means are far apart compared with the variation — the difference is probably real. A small t means the difference is small compared with the variation — it could easily be natural variation.
So the whole test comes down to: calculate t, then check whether it is big enough to take seriously (by comparing it with a critical value from a table). That is exactly the steps below.
The formula (given to you in the exam)
You are always given this formula — you never memorise it. What you need is to know what each symbol means and be able to put your numbers in. This is the unpaired t-test — the version used when the two samples are separate, independent groups, like our two shore zones (mussels in zone A are different animals from those in zone B):
Reading the symbols:
- tthe t value — the number you are calculating.
- x̄₁the mean of sample 1 (here, the high-zone mussels).
- x̄₂the mean of sample 2 (here, the low-zone mussels).
- | … |the difference between the two means, taken as a positive number (it does not matter which way round you subtract).
- s₁²the standard deviation of sample 1, squared.
- s₂²the standard deviation of sample 2, squared.
- n₁the number of readings in sample 1 (here 15).
- n₂the number of readings in sample 2 (here 15).
- √square root of everything inside — do it last.
In plain words: the difference between the two means (top), divided by a number built from how much the readings vary and how many you measured (bottom).
Where do s₁² and s₂² come from? (a separate step you do first)
The two “s²” terms in the t-formula are not new or complicated quantities. Each one is just the standard deviation of that sample, multiplied by itself (squared). So before you use the t-formula, you carry out the two steps below for each sample separately.
Step A — calculate the standard deviation (s) of the sample. The standard deviation is a measure of how spread out the readings are. Its formula is:
Every symbol in that formula, spelled out:
- sthe standard deviation — the number this formula gives you.
- √square root — work out everything inside first, then take the square root at the very end.
- Σthe Greek letter “sigma”, meaning “add up all of” the terms that follow.
- xone individual reading (here, one mussel’s shell length).
- x̄the mean (average) of that sample — said “x-bar”. You work this out first.
- (x−x̄)how far one reading is from the mean (subtract the mean from the reading).
- (x−x̄)²that distance squared (multiplied by itself), so that readings below the mean don’t cancel out readings above it.
- nthe number of readings in the sample; n−1 is one less than that.
In words, the steps to work out s for one sample are: (1) find the mean of the sample; (2) for each reading, subtract the mean and square the answer; (3) add all those squared values together; (4) divide by (n−1), i.e. one less than the number of readings; (5) take the square root. (This is exactly the standard deviation explained in full on the descriptive statistics page.)
Step B — square the standard deviation to get s². For example, if a sample’s standard deviation works out as s = 3, then s² = 3 × 3 = 9. This squared value is what you put into the t-test formula above.
You carry out Step A and Step B twice: once for sample 1 (which gives you s₁²) and once for sample 2 (which gives you s₂²). Then you put both numbers into the t-formula. That is the only reason s² appears — the t-formula uses the squared standard deviation of each sample.
Worked example: the mussels, start to finish
The real data, worked through exactly as you would in an exam or write-up. Click to expand the full method.
Do high-zone and low-zone mussels differ in shell length?
t-testworked
15 mussels were measured from the high-tide zone (Sample A) and 15 from the low-tide zone (Sample B). Shell lengths in mm are below. Is there a significant difference between the mean shell lengths of the two zones?
Step 1 — State the null hypothesis
Before any numbers, write the null hypothesis — the assumption that there is no real effect. Everything that follows is a test of whether we can reject it.
Null hypothesis (H₀): there is no difference between the mean shell length of mussels in sample A (high zone) and sample B (low zone).
(Any difference we see is assumed to be due to natural variation until the test shows otherwise. The hypothesis says “no difference” — the word “significant” belongs to your conclusion, not the hypothesis. You will also see “no significant difference” accepted in some mark schemes.)
Step 2 — The raw data (shell length, mm)
| Mussel | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A (high) | 33 | 35 | 34 | 32 | 34 | 37 | 29 | 30 | 30 | 37 | 31 | 36 | 32 | 29 | 32 |
| B (low) | 22 | 23 | 23 | 24 | 23 | 24 | 25 | 22 | 23 | 24 | 24 | 24 | 23 | 25 | 23 |
Step 3 — Find the two means
- Sample A: add the 15 values (= 491) ÷ 15 = x̄₁ = 32.7 mm
- Sample B: add the 15 values (= 352) ÷ 15 = x̄₂ = 23.5 mm
Straight away the high-zone mussels look longer — but we must check the difference is bigger than variation would give.
Step 4 — Find each standard deviation, then square it
Using s = √[Σ(x−x̄)² ÷ (n−1)] for each sample (calculator or spreadsheet does the arithmetic), then square each one because the t formula needs s²:
- Sample A: standard deviation s₁ = 2.71 mm, so s₁² = 2.71² = 7.35
- Sample B: standard deviation s₂ = 0.91 mm, so s₂² = 0.91² = 0.84
Note the low-zone mussels (B) are much more consistent (small standard deviation) than the high-zone ones.
Step 5 — Put the numbers into the formula
t = | 32.7 − 23.5 | ÷ √( 7.35/15 + 0.84/15 ) t = 9.2 ÷ √( 0.490 + 0.056 ) = 9.2 ÷ √0.546 = 9.2 ÷ 0.739 t = 12.5Step 6 — Degrees of freedom
df = (n₁ − 1) + (n₂ − 1) = (15 − 1) + (15 − 1) = 28Step 7 — Compare with the critical value
From the t-table, the critical value at p = 0.05 and 28 degrees of freedom is 2.048. Our value:
t = 12.5 is much GREATER than critical value 2.048 Medical example: does a new drug lower blood pressure?
t-testmedical
In a trial of a new blood-pressure drug, 12 patients were given the drug and a separate 12 patients were given a placebo (a dummy pill with no active ingredient). After four weeks each patient’s systolic blood pressure (mmHg) was measured. The two groups are different people, so this is an unpaired t-test. Is the drug group’s mean blood pressure significantly lower than the placebo group’s?
Why a placebo group? (the control)
The placebo group is the control. Blood pressure can fall just because a patient believes they are being treated (the placebo effect) or simply over time. Comparing the drug group against a placebo group means any difference we find is due to the drug itself, not to being treated in general. Everything else — the four-week period, the measuring method, the type of patient — is kept the same for both groups (the controlled variables).
Step 1 — State the null hypothesis
Null hypothesis (H₀): there is no difference between the mean systolic blood pressure of the drug group and the placebo group.
Step 2 — The raw data (systolic blood pressure, mmHg)
| Patient | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Drug | 131 | 128 | 135 | 129 | 133 | 127 | 130 | 132 | 126 | 134 | 129 | 131 |
| Placebo | 142 | 139 | 145 | 141 | 138 | 144 | 140 | 143 | 137 | 146 | 141 | 139 |
Step 3 — Find the two means
- Drug: add the 12 values (= 1565) ÷ 12 = x̄₁ = 130.42 mmHg
- Placebo: add the 12 values (= 1695) ÷ 12 = x̄₂ = 141.25 mmHg
The drug group’s mean is about 10.8 mmHg lower — but we must check that gap is bigger than variation would give.
Step 4 — Find each standard deviation, then square it
- Drug: standard deviation s₁ = 2.78 mmHg, so s₁² = 2.78² = 7.72
- Placebo: standard deviation s₂ = 2.83 mmHg, so s₂² = 2.83² = 8.02
Step 5 — Put the numbers into the formula
t = | 130.42 − 141.25 | ÷ √( 7.72/12 + 8.02/12 ) t = 10.83 ÷ √( 0.643 + 0.668 ) = 10.83 ÷ √1.311 = 10.83 ÷ 1.145 t = 9.46Step 6 — Degrees of freedom
df = (n₁ − 1) + (n₂ − 1) = (12 − 1) + (12 − 1) = 22Step 7 — Compare with the critical value
From the t-table, the critical value at p = 0.05 and 22 degrees of freedom is 2.074.
t = 9.46 is much GREATER than critical value 2.074Degrees of freedom & reading the table
Degrees of freedom is a number that tells you which row of the t-table to use. For an unpaired t-test it is simply:
degrees of freedom = (n₁ − 1) + (n₂ − 1) = n₁ + n₂ − 2— one less than the number in each sample, added together. With 15 mussels in each sample: (15−1)+(15−1) = 28.
You then look along that row to the p = 0.05 column to find the critical value. The rule is the same every time:
The decision rule
- calculated t > critical value → the difference is significant → reject the null hypothesis (there is a real difference).
- calculated t < critical value → the difference is not significant → you cannot reject the null hypothesis (the difference could be natural variation).
| Degrees of freedom | Critical value (p = 0.05) |
|---|---|
| 10 | 2.228 |
| 20 | 2.086 |
| 22 (drug trial example) | 2.074 |
| 25 | 2.060 |
| 28 (our test) | 2.048 |
| 30 | 2.042 |
Paired vs unpaired t-tests
There are two versions of the t-test, and choosing the right one matters (OCR in particular expects you to know both).
Unpaired t-test
Use when the two samples are independent — different individuals in each group, with no link between a particular member of one group and a particular member of the other. Our mussel example is unpaired: the high-zone mussels and low-zone mussels are completely separate animals.
Paired t-test
Use when each measurement in one group is naturally linked to one in the other — most often two measurements from the same individual. For example, each person’s blood glucose before and after a treatment. You analyse the difference within each pair, which can reveal a real effect that pooling would hide.
Important: the two tests use different formulae
This is a common trap. The paired t-test is not the unpaired formula with the data paired up — it is a different formula. Here they are side by side.
Unpaired (two separate groups — the mussel example): it uses both means and both standard deviations.
Paired (linked pairs — e.g. before/after the same individual): you first work out the difference for each pair, then the test uses only those differences.
Reading every symbol in the paired formula:
- tthe t value — the number you are calculating.
- dthe difference for one pair — the second reading minus the first (e.g. after − before) for that individual.
- d̄the mean of all the differences (said “d-bar”) — add up every pair’s difference and divide by the number of pairs.
- sdthe standard deviation of the differences — how spread out the individual differences are, worked out with the standard-deviation formula (from the descriptive statistics page) but using the differences as the data.
- √nthe square root of n, where n is the number of pairs.
The method for a paired test, step by step:
- Find the difference (d) for each pair — subtract one reading from the other (keep the direction consistent, e.g. always after − before).
- Find the mean of those differences (d̄).
- Find the standard deviation of those differences (sd) using the standard-deviation formula, treating each difference as one value.
- Put d̄, sd and n into the formula above to get t.
- Compare t with the critical value at p = 0.05, using degrees of freedom = number of pairs − 1.
How to decide which test
Ask: “Is each reading in sample 1 tied to one specific reading in sample 2?” If yes (same individual, before/after, matched pairs) → paired. If no (two separate groups) → unpaired. You must have a genuine reason to pair — never pair data up after collecting it just to change the result.
A second example: a paired t-test
A different context, and the paired version — to show when pairing is the right choice.
Does caffeine change resting heart rate?
t-testworked
Ten students had their resting heart rate (beats per minute) measured before and 30 minutes after drinking a caffeinated drink. Because each “before” reading is linked to the same student’s “after” reading, this is paired data. Does caffeine significantly change heart rate?
Step 1 — The data, and the difference within each pair
The trick with a paired test: work with the difference (d) for each individual, then do a t-test on those differences.
| Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Before | 68 | 72 | 65 | 70 | 74 | 66 | 69 | 71 | 67 | 73 |
| After | 72 | 75 | 70 | 72 | 78 | 69 | 74 | 73 | 71 | 76 |
| Difference d | +4 | +3 | +5 | +2 | +4 | +3 | +5 | +2 | +4 | +3 |
Step 2 — The idea
If caffeine did nothing, the differences would scatter around zero (some up, some down). Here every difference is positive and they cluster around +3 to +4 — a strong hint that heart rate really rose. The paired t-test checks whether the mean difference is significantly different from zero.
- Mean difference d̄ = +3.5 bpm
- The differences are consistent (small standard deviation), and none is negative.
Step 3 — The result
Putting the mean difference and its standard deviation into the paired-t formula gives a large t value, well above the critical value at p = 0.05 for 9 degrees of freedom (df = number of pairs − 1 = 10 − 1 = 9; critical value 2.262).
Check your understanding
Test yourself. The last two tabs give you real data to calculate a t value yourself — one unpaired, one paired — and mark your working.
What each exam board expects
All the main A-level Biology specifications include the t-test as a named statistical test.
| Board | What is required |
|---|---|
| AQA (7402) | Select and use the Student t-test when comparing the mean values of two sets of data; interpret the probability value against 0.05 and understand degrees of freedom. |
| OCR A / B | t-test named explicitly, both paired and unpaired; choose the correct test and interpret the result against the critical value. |
| Edexcel A / B | Use the t-test to compare two means; understand significance at the 5% level and reject/accept the null hypothesis. |
| WJEC / Eduqas | t-test to compare two means, using standard deviation/variance and sample size; interpret against tabulated critical values. |
