Skip to content

When do you use a t-test?

You use a t-test when you have two groups of measured (continuous) data and you want to know whether the difference between their means is real or just natural variation. If you have read the descriptive statistics page, this is the exact question raised there: is the gap between two means bigger than variation alone would produce? The t-test is the tool that answers it.

Use a t-test when…

  • you are comparing two means (exactly two groups — no more);
  • the data are continuous (measured, e.g. length, mass, height) — not counts or categories;
  • the data are roughly normally distributed (the bell shape).

If instead you are comparing counts in categories (e.g. how many organisms in each zone), that is a job for the chi-squared test, not the t-test. If you are looking for a relationship between two variables, that is correlation.

The investigation: mussels on a rocky shore

Here is the real fieldwork example we will work through. On a rocky shore, mussels live attached to the rocks in two zones: the high-tide zone near the top of the shore, and the low-tide zone further down. A student wondered whether mussels grow to a different length in the two zones (perhaps because the lower zone is covered by seawater for longer each day, giving more feeding time).

A large mussel shell from the high-tide zone, sample A, with an arrow showing its length. Sample A — high zone (larger)
A smaller mussel shell from the low-tide zone, sample B, with an arrow showing its length. Sample B — low zone (smaller)
Real mussel shells from the two zones (the arrows show the shell length being measured). Even by eye the two look different sizes — the t-test tells us whether that difference is significant.

She collected a sample of 15 mussels from each zone and measured the length of every shell in millimetres. The two zones give her two samples:

  • Sample A — 15 mussels from the high-tide zone.
  • Sample B — 15 mussels from the low-tide zone.
HIGH-TIDE MARKLOW-TIDE MARKSplash zone(rarely covered by sea)High-tide zone(covered at high tide)Low-tide zone(covered most of the time)Sample ASample B
Cross-section of the shore. Sample A comes from the high-tide zone (upper shore); sample B from the low-tide zone (lower shore). The question: do their mean shell lengths differ?

Each mussel gives one length measurement — that single value is what we call x:

Each mussel gives ONE reading — the length of its shellshell length (x), measured in mmhingebroad end
The shell length of one mussel, measured end to end in mm, is one value of x. A whole sample is 15 of these.

The null hypothesis

Every statistical test starts by assuming there is no real effect — this starting assumption is the null hypothesis. Here it is: “There is no difference between the mean shell length of mussels in sample A (high zone) and sample B (low zone).” The t-test will tell us whether the evidence is strong enough to reject this assumption.

The idea behind the t-test

The two samples will almost certainly have different means — but remember from the descriptive statistics page, two samples differ a little even when nothing real is going on, simply because individuals vary. So a difference in means by itself proves nothing. The t-test weighs the difference against the variation, and it does this by comparing two things:

  • The size of the difference between the two means — the bigger this is, the more likely it is real. (This is the top of the formula.)
  • How much the measurements vary, and how many you took — the more the individual mussels vary (and the fewer you measured), the more the means could differ by variation alone. (This is the bottom of the formula.)

Why you must calculate the standard deviation, not just the means

This is the key point, and it is why the t-test needs the standard deviation (the measure of spread). Look at the two situations below. In both, the two means are the same distance apart — but the conclusion is completely different, purely because of the spread:

same difference in meansABLow SD → curves separate → likely SIGNIFICANTsame difference in meansABHigh SD → curves overlap → may NOT be significant
Each curve shows the spread of one sample around its mean. Left: both samples have a small standard deviation, so the curves barely overlap — the two groups really are different, so the difference is likely significant. Right: the same gap between means, but sample B now has a large standard deviation, so the curves overlap heavily — the values blur together and the difference may not be significant.

The lesson

The difference between the means is not enough on its own. The same difference can be significant (curves separate) or not significant (curves overlap) — it depends entirely on the spread. That is why you must calculate the standard deviation of each sample and feed it into the t-test: it is the standard deviation that tells you whether the curves separate or overlap.

The t value in one sentence

The t value is simply the difference between the two means, measured in units of the variation. A big t means the means are far apart compared with the variation — the difference is probably real. A small t means the difference is small compared with the variation — it could easily be natural variation.

So the whole test comes down to: calculate t, then check whether it is big enough to take seriously (by comparing it with a critical value from a table). That is exactly the steps below.

The formula (given to you in the exam)

You are always given this formula — you never memorise it. What you need is to know what each symbol means and be able to put your numbers in. This is the unpaired t-test — the version used when the two samples are separate, independent groups, like our two shore zones (mussels in zone A are different animals from those in zone B):

Reading the symbols:

  • tthe t value — the number you are calculating.
  • x̄₁the mean of sample 1 (here, the high-zone mussels).
  • x̄₂the mean of sample 2 (here, the low-zone mussels).
  • | … |the difference between the two means, taken as a positive number (it does not matter which way round you subtract).
  • s₁²the standard deviation of sample 1, squared.
  • s₂²the standard deviation of sample 2, squared.
  • n₁the number of readings in sample 1 (here 15).
  • n₂the number of readings in sample 2 (here 15).
  • square root of everything inside — do it last.

In plain words: the difference between the two means (top), divided by a number built from how much the readings vary and how many you measured (bottom).

Where do s₁² and s₂² come from? (a separate step you do first)

The two “s²” terms in the t-formula are not new or complicated quantities. Each one is just the standard deviation of that sample, multiplied by itself (squared). So before you use the t-formula, you carry out the two steps below for each sample separately.

Step A — calculate the standard deviation (s) of the sample. The standard deviation is a measure of how spread out the readings are. Its formula is:

Every symbol in that formula, spelled out:

  • sthe standard deviation — the number this formula gives you.
  • square root — work out everything inside first, then take the square root at the very end.
  • Σthe Greek letter “sigma”, meaning “add up all of” the terms that follow.
  • xone individual reading (here, one mussel’s shell length).
  • the mean (average) of that sample — said “x-bar”. You work this out first.
  • (x−x̄)how far one reading is from the mean (subtract the mean from the reading).
  • (x−x̄)²that distance squared (multiplied by itself), so that readings below the mean don’t cancel out readings above it.
  • nthe number of readings in the sample; n−1 is one less than that.

In words, the steps to work out s for one sample are: (1) find the mean of the sample; (2) for each reading, subtract the mean and square the answer; (3) add all those squared values together; (4) divide by (n−1), i.e. one less than the number of readings; (5) take the square root. (This is exactly the standard deviation explained in full on the descriptive statistics page.)

Step B — square the standard deviation to get s². For example, if a sample’s standard deviation works out as s = 3, then s² = 3 × 3 = 9. This squared value is what you put into the t-test formula above.

You carry out Step A and Step B twice: once for sample 1 (which gives you s₁²) and once for sample 2 (which gives you s₂²). Then you put both numbers into the t-formula. That is the only reason s² appears — the t-formula uses the squared standard deviation of each sample.

Worked example: the mussels, start to finish

The real data, worked through exactly as you would in an exam or write-up. Click to expand the full method.

Do high-zone and low-zone mussels differ in shell length?

Unpaired t-test · real fieldwork data
t-testworked

15 mussels were measured from the high-tide zone (Sample A) and 15 from the low-tide zone (Sample B). Shell lengths in mm are below. Is there a significant difference between the mean shell lengths of the two zones?

Step 1 — State the null hypothesis

Before any numbers, write the null hypothesis — the assumption that there is no real effect. Everything that follows is a test of whether we can reject it.

Null hypothesis (H₀): there is no difference between the mean shell length of mussels in sample A (high zone) and sample B (low zone).
(Any difference we see is assumed to be due to natural variation until the test shows otherwise. The hypothesis says “no difference” — the word “significant” belongs to your conclusion, not the hypothesis. You will also see “no significant difference” accepted in some mark schemes.)

Step 2 — The raw data (shell length, mm)

Mussel123456789101112131415
A (high)333534323437293030373136322932
B (low)222323242324252223242424232523

Step 3 — Find the two means

  • Sample A: add the 15 values (= 491) ÷ 15 = x̄₁ = 32.7 mm
  • Sample B: add the 15 values (= 352) ÷ 15 = x̄₂ = 23.5 mm

Straight away the high-zone mussels look longer — but we must check the difference is bigger than variation would give.

Step 4 — Find each standard deviation, then square it

Using s = √[Σ(x−x̄)² ÷ (n−1)] for each sample (calculator or spreadsheet does the arithmetic), then square each one because the t formula needs s²:

  • Sample A: standard deviation s₁ = 2.71 mm, so s₁² = 2.71² = 7.35
  • Sample B: standard deviation s₂ = 0.91 mm, so s₂² = 0.91² = 0.84

Note the low-zone mussels (B) are much more consistent (small standard deviation) than the high-zone ones.

Step 5 — Put the numbers into the formula

t = | 32.7 − 23.5 | ÷ √( 7.35/15 + 0.84/15 ) t = 9.2 ÷ √( 0.490 + 0.056 ) = 9.2 ÷ √0.546 = 9.2 ÷ 0.739 t = 12.5

Step 6 — Degrees of freedom

df = (n₁ − 1) + (n₂ − 1) = (15 − 1) + (15 − 1) = 28

Step 7 — Compare with the critical value

From the t-table, the critical value at p = 0.05 and 28 degrees of freedom is 2.048. Our value:

t = 12.5 is much GREATER than critical value 2.048
Conclusion: because the calculated t (12.5) is far greater than the critical value (2.048), the difference between the means is significant — we reject the null hypothesis. There is a real difference between the mean shell lengths of mussels in the high-tide and low-tide zones (high-zone mussels are longer). The difference is very unlikely (well under 5% chance) to be due to natural variation alone.
Write the conclusion about the difference between the means, tie it to p = 0.05, and say you reject the null hypothesis. A significant result says the difference is unlikely to be chance — it does not by itself prove why (that is the biology: feeding time, exposure, competition…).

Medical example: does a new drug lower blood pressure?

Unpaired t-test · drug vs placebo · n = 12 each
t-testmedical

In a trial of a new blood-pressure drug, 12 patients were given the drug and a separate 12 patients were given a placebo (a dummy pill with no active ingredient). After four weeks each patient’s systolic blood pressure (mmHg) was measured. The two groups are different people, so this is an unpaired t-test. Is the drug group’s mean blood pressure significantly lower than the placebo group’s?

Why a placebo group? (the control)

The placebo group is the control. Blood pressure can fall just because a patient believes they are being treated (the placebo effect) or simply over time. Comparing the drug group against a placebo group means any difference we find is due to the drug itself, not to being treated in general. Everything else — the four-week period, the measuring method, the type of patient — is kept the same for both groups (the controlled variables).

Step 1 — State the null hypothesis

Null hypothesis (H₀): there is no difference between the mean systolic blood pressure of the drug group and the placebo group.

Step 2 — The raw data (systolic blood pressure, mmHg)

Patient123456789101112
Drug131128135129133127130132126134129131
Placebo142139145141138144140143137146141139

Step 3 — Find the two means

  • Drug: add the 12 values (= 1565) ÷ 12 = x̄₁ = 130.42 mmHg
  • Placebo: add the 12 values (= 1695) ÷ 12 = x̄₂ = 141.25 mmHg

The drug group’s mean is about 10.8 mmHg lower — but we must check that gap is bigger than variation would give.

Step 4 — Find each standard deviation, then square it

  • Drug: standard deviation s₁ = 2.78 mmHg, so s₁² = 2.78² = 7.72
  • Placebo: standard deviation s₂ = 2.83 mmHg, so s₂² = 2.83² = 8.02

Step 5 — Put the numbers into the formula

t = | 130.42 − 141.25 | ÷ √( 7.72/12 + 8.02/12 ) t = 10.83 ÷ √( 0.643 + 0.668 ) = 10.83 ÷ √1.311 = 10.83 ÷ 1.145 t = 9.46

Step 6 — Degrees of freedom

df = (n₁ − 1) + (n₂ − 1) = (12 − 1) + (12 − 1) = 22

Step 7 — Compare with the critical value

From the t-table, the critical value at p = 0.05 and 22 degrees of freedom is 2.074.

t = 9.46 is much GREATER than critical value 2.074
Conclusion: because the calculated t (9.46) is far greater than the critical value (2.074), the difference between the means is significant — we reject the null hypothesis. The drug group’s mean systolic blood pressure is significantly lower than the placebo group’s, so the drug does lower blood pressure. Because the only planned difference between the groups was the drug (the placebo group is the control), we can attribute the effect to the drug.
In a drug trial the placebo group is the control and the comparison is unpaired (two different sets of patients). If instead you measured the same patients before and after the drug, that would be paired — see the next section.

Degrees of freedom & reading the table

Degrees of freedom is a number that tells you which row of the t-table to use. For an unpaired t-test it is simply:

degrees of freedom = (n₁ − 1) + (n₂ − 1) = n₁ + n₂ − 2

— one less than the number in each sample, added together. With 15 mussels in each sample: (15−1)+(15−1) = 28.

You then look along that row to the p = 0.05 column to find the critical value. The rule is the same every time:

The decision rule

  • calculated t > critical value → the difference is significantreject the null hypothesis (there is a real difference).
  • calculated t < critical value → the difference is not significant → you cannot reject the null hypothesis (the difference could be natural variation).
Degrees of freedomCritical value (p = 0.05)
102.228
202.086
22 (drug trial example)2.074
252.060
28 (our test)2.048
302.042
Bigger samples give more degrees of freedom and a smaller critical value — so it is easier to detect a real difference. That is the maths rewarding you for collecting more data.

Paired vs unpaired t-tests

There are two versions of the t-test, and choosing the right one matters (OCR in particular expects you to know both).

Unpaired t-test

two separate, independent groups

Use when the two samples are independent — different individuals in each group, with no link between a particular member of one group and a particular member of the other. Our mussel example is unpaired: the high-zone mussels and low-zone mussels are completely separate animals.

Paired t-test

each value linked to a partner

Use when each measurement in one group is naturally linked to one in the other — most often two measurements from the same individual. For example, each person’s blood glucose before and after a treatment. You analyse the difference within each pair, which can reveal a real effect that pooling would hide.

Important: the two tests use different formulae

This is a common trap. The paired t-test is not the unpaired formula with the data paired up — it is a different formula. Here they are side by side.

Unpaired (two separate groups — the mussel example): it uses both means and both standard deviations.

Paired (linked pairs — e.g. before/after the same individual): you first work out the difference for each pair, then the test uses only those differences.

Reading every symbol in the paired formula:

  • tthe t value — the number you are calculating.
  • dthe difference for one pair — the second reading minus the first (e.g. after − before) for that individual.
  • the mean of all the differences (said “d-bar”) — add up every pair’s difference and divide by the number of pairs.
  • sdthe standard deviation of the differences — how spread out the individual differences are, worked out with the standard-deviation formula (from the descriptive statistics page) but using the differences as the data.
  • √nthe square root of n, where n is the number of pairs.

The method for a paired test, step by step:

  1. Find the difference (d) for each pair — subtract one reading from the other (keep the direction consistent, e.g. always after − before).
  2. Find the mean of those differences (d̄).
  3. Find the standard deviation of those differences (sd) using the standard-deviation formula, treating each difference as one value.
  4. Put d̄, sd and n into the formula above to get t.
  5. Compare t with the critical value at p = 0.05, using degrees of freedom = number of pairs − 1.
Why it is a different formula: a paired test throws away the big person-to-person differences and looks only at the change within each individual. If a treatment had no effect the differences would scatter around zero; a mean difference far from zero (relative to their spread) gives a large t. Using the unpaired formula on paired data can hide a real effect.

How to decide which test

Ask: “Is each reading in sample 1 tied to one specific reading in sample 2?” If yes (same individual, before/after, matched pairs) → paired. If no (two separate groups) → unpaired. You must have a genuine reason to pair — never pair data up after collecting it just to change the result.

A second example: a paired t-test

A different context, and the paired version — to show when pairing is the right choice.

Does caffeine change resting heart rate?

Paired t-test · before-and-after
t-testworked

Ten students had their resting heart rate (beats per minute) measured before and 30 minutes after drinking a caffeinated drink. Because each “before” reading is linked to the same student’s “after” reading, this is paired data. Does caffeine significantly change heart rate?

Step 1 — The data, and the difference within each pair

The trick with a paired test: work with the difference (d) for each individual, then do a t-test on those differences.

Student12345678910
Before68726570746669716773
After72757072786974737176
Difference d+4+3+5+2+4+3+5+2+4+3

Step 2 — The idea

If caffeine did nothing, the differences would scatter around zero (some up, some down). Here every difference is positive and they cluster around +3 to +4 — a strong hint that heart rate really rose. The paired t-test checks whether the mean difference is significantly different from zero.

  • Mean difference d̄ = +3.5 bpm
  • The differences are consistent (small standard deviation), and none is negative.

Step 3 — The result

Putting the mean difference and its standard deviation into the paired-t formula gives a large t value, well above the critical value at p = 0.05 for 9 degrees of freedom (df = number of pairs − 1 = 10 − 1 = 9; critical value 2.262).

Conclusion: the mean increase of 3.5 bpm is significant — reject the null hypothesis. Caffeine significantly increased resting heart rate in this sample. Because the data were paired (same student before and after), we removed the big person-to-person differences in baseline heart rate and could detect the smaller, real effect of the caffeine.
The giveaway for a paired test is “the same individuals measured twice” or “before and after”. Spotting paired vs unpaired is a common exam mark — here, using an unpaired test would have wrongly buried the effect under natural differences between people.

Check your understanding

Test yourself. The last two tabs give you real data to calculate a t value yourself — one unpaired, one paired — and mark your working.

What each exam board expects

All the main A-level Biology specifications include the t-test as a named statistical test.

BoardWhat is required
AQA (7402)Select and use the Student t-test when comparing the mean values of two sets of data; interpret the probability value against 0.05 and understand degrees of freedom.
OCR A / Bt-test named explicitly, both paired and unpaired; choose the correct test and interpret the result against the critical value.
Edexcel A / BUse the t-test to compare two means; understand significance at the 5% level and reject/accept the null hypothesis.
WJEC / Eduqast-test to compare two means, using standard deviation/variance and sample size; interpret against tabulated critical values.