What is standard deviation in A-level Biology?

Standard deviation is a measure of the spread of data around the mean. A small standard deviation means the values are clustered tightly around the mean; a large standard deviation means they are widely spread. It is more useful than the range because it uses every value, not just the two extremes, so a single outlier has less effect on it.

What is the difference between standard deviation and standard error?

Standard deviation describes the spread of the individual data values in your sample. Standard error describes how reliable your estimate of the mean is — it is the standard deviation divided by the square root of the sample size (SE = SD ÷ √n). Standard error is always smaller than standard deviation and gets smaller as your sample gets larger. Use standard deviation error bars to show variability, and standard error error bars to compare means.

What do error bars tell you in A-level Biology?

Error bars show how much variation or uncertainty there is around a mean. If two means have standard error bars that overlap, the difference between them is unlikely to be statistically significant. If the standard error bars do not overlap, the difference may be significant and a statistical test such as a t-test should be carried out to confirm it.

What is the 68–95–99.7 rule?

For data that follow a normal distribution, about 68% of values lie within 1 standard deviation of the mean, about 95% lie within 2 standard deviations, and about 99.7% lie within 3 standard deviations. This is why 'mean ± 2 standard deviations' is used as an approximate 95% range in A-level Biology.

AQA OCR Edexcel WJEC / Eduqas

Descriptive statistics: mean, standard deviation & standard error

Before you run any statistical test, you describe your data. This page explains the mean, the standard deviation, the standard error and error bars — what each one tells you, and how to read them the way an examiner expects. Drag the interactive below to see spread change in real time.

Mean, median & mode Standard deviation Range vs SD Normal distribution Significance & chance Standard error & error bars Interactive explorer Worked example

Living things vary. Measure the height of 100 people, the length of 100 leaves or the mass of 100 woodlice and no two values will be identical. Descriptive statistics are the numbers that sum up that variation so you can see what your data are actually telling you: where the values cluster (the mean), and how spread out they are (the standard deviation).

You will not usually be asked to calculate a standard deviation by hand in a written exam — you use a calculator or spreadsheet for that. What you are examined on is understanding what these numbers mean, choosing the right one, and interpreting them, especially error bars. That is what this page focuses on.

The two things every set of data has

A centre — a value the data cluster around (mean, median or mode). A spread — how far the data stretch out from that centre (range, standard deviation). Almost all of biological statistics is comparing centres while taking the spread into account.

Measures of the centre: mean, median & mode

There are three ways to describe the “middle” of a set of data. You need to know what each is and when it is the sensible choice.

Mean (the average) — add up all the values and divide by how many there are. This is by far the most used measure in biology and the one that feeds into statistical tests.
Reading the symbols:
- x̄the mean (said “x-bar”) — the answer you are working out.
- Σ“the sum of” — the Greek capital letter sigma. It just says “add up everything that follows”. (Handy memory: sigma and sum both start with s.)
- xeach individual value (each measurement you took).
- nthe number of values in your sample (how many measurements). n for number.
So Σx ÷ n simply means “add up all your measurements, then divide by how many there were”.
Median — put the values in order and take the middle one (or the average of the middle two). Useful when there are extreme values (outliers) that would drag the mean, because the median is barely affected by them.
Mode — the most common value. The only measure you can use for data that are categories (e.g. eye colour). On a graph it is the tallest bar or peak.

Which one should you use? (a common exam question)

You are often asked which measure of central tendency is most appropriate for a given set of data, and why. There is no single “best” one — it depends on the data. Here is how to choose.

Use the mean when…

the usual first choice for measured data

the data are continuous (measured, e.g. height, mass, length) and roughly symmetrical with no big outliers.
it uses every value, so it is the most representative measure — and it is the one needed for standard deviation and statistical tests such as the t-test.
Avoid it when there is an outlier or the data are skewed, because one extreme value drags the mean and it stops representing a “typical” value.

Use the median when…

the safe choice for skewed data or outliers

the data are skewed or contain anomalous/outlying values — the median is barely affected by extremes, so it still represents a typical value.
works for continuous or ordered (ranked) data.
it ignores most of the values (only the middle position matters), and it is not used in the standard significance tests.

Use the mode when…

the only choice for categories

the data are categories / discrete groups (e.g. eye colour, blood group, phenotype) — here a mean or median makes no sense, so the mode is the only measure you can use.
you want the single most common value quickly.
it can be misleading or undefined for continuous data (there may be no repeated value, or more than one mode).

How to answer it in the exam

Name the measure and justify it from the data. For example: “The median is most appropriate because the data contain an outlier, which would distort the mean.” Or: “The mode is most appropriate because the data are categories (blood groups), so a mean cannot be calculated.” The mark is almost always for the reason, not just the name.

In a perfectly symmetrical (normal) distribution the mean, median and mode all sit at the same point, right at the peak. As soon as the distribution becomes skewed (lopsided, with a long tail on one side), the three measures pull apart — and the order they fall in tells you which way the data are skewed. The diagram below shows all three cases at a glance:

Mode Median Mean

The mean (orange) is always dragged furthest towards the long tail; the mode (green) stays at the peak. In a symmetrical distribution all three coincide.

Positive skew (right-skewed)

A few unusually large values stretch the distribution out into a long tail on the right. Those extreme values pull the mean towards them, so the mean ends up the highest of the three:

positive skew: mode < median < mean

The mode stays put at the peak (the most common value), the mean is dragged furthest towards the tail, and the median sits in between. Biological example: the number of offspring per individual, where most have few but a handful have very many.

Negative skew (left-skewed)

A few unusually small values stretch the distribution into a long tail on the left. Now the mean is pulled downwards, so the order reverses:

negative skew: mean < median < mode

A useful memory hook: the mean always chases the tail. Whichever side the long tail is on, that is the side the mean shifts towards — because the mean is the only one of the three affected by how far the extreme values are, not just how many. The median is barely moved and the mode not at all, which is exactly why the median is the safer measure of centre when data are skewed or contain outliers.

See it move

In the interactive explorer further down this page, switch to the “Skew: mean vs median vs mode” view. Drag the skew slider from negative through symmetrical to positive and watch the three markers separate and swap order — the mean (orange) always heads into the tail first.

Questions often show a skewed histogram and ask you to state the order of mean, median and mode, or to say which measure of central tendency is most appropriate. If there is a long tail or an outlier, the median is usually the best answer because it is least affected by extreme values.

Worked calculation: watch the mean, median & mode change

Central tendency · effect of an outlier Mean / median / mode

A student counts the number of seeds per pod in nine pea pods:

4, 5, 5, 6, 6, 6, 7, 7, 8

Calculate the mean, median and mode. Then a tenth pod is found with 28 seeds (an unusually large pod). Recalculate all three and comment.

Step 1 — the original nine pods

The values are already in order, which makes the median easy.

Mean = sum ÷ n: (4+5+5+6+6+6+7+7+8) ÷ 9 = 54 ÷ 9 = 6.0 seeds
Median = the middle value. With 9 values the middle is the 5th: 4, 5, 5, 6, [6], 6, 7, 7, 8 → median = 6 seeds
Mode = most common value. Six appears three times: mode = 6 seeds

Symmetrical data: mean = median = mode = 6 seeds. All three agree, which tells you the data are roughly symmetrical — no skew.

Step 2 — add the outlier (the 28-seed pod)

The data set is now: 4, 5, 5, 6, 6, 6, 7, 7, 8, 28 (n = 10).

New mean: (54 + 28) ÷ 10 = 82 ÷ 10 = 8.2 seeds Up from 6.0 to 8.2 — a big jump, and 8.2 is higher than all but one of the pods, so it no longer represents a “typical” pod.
New median = average of the two middle values (5th and 6th of 10): 4, 5, 5, 6, [6, 6], 7, 7, 8, 28 → (6+6) ÷ 2 = 6 seeds Unchanged at 6.
New mode = still 6 (appears three times): mode = 6 seeds Unchanged.

Step 3 — what changed

Measure	Before	After outlier
Mean	6.0	8.2 (jumped up)
Median	6	6 (no change)
Mode	6	6 (no change)

Conclusion: one large value dragged the mean up by 2.2 seeds but left the median and mode untouched. Because mean > median > mode, the data are now positively skewed. Here the median (6) is the better measure of a typical pod, because it is not distorted by the single unusual value.

This is the numerical version of “the mean chases the tail”. It is also the clearest way to answer why the median can be more representative than the mean — show that one outlier moves the mean but not the median.

Standard deviation: measuring the spread

The simplest measure of spread is the range (highest value − lowest value). It is quick, but it uses only two values, so a single freak result can distort it completely. The standard deviation (SD) is better because it uses every value.

Standard deviation answers one question: on average, how far is each value from the mean? A small SD means the data are packed tightly around the mean; a large SD means they are widely spread. You will be able to see this directly in the interactive explorer further down the page.

The formula (given to you in the exam)

Reading the symbols:

sthe standard deviation — the answer.
√square root — do everything inside the brackets first, then square-root the result at the very end.
Σ“the sum of” — add up everything that follows (once you’ve done it for every value).
xeach individual value.
x̄the mean of all the values (you work this out first).
(x−x̄)how far one value is from the mean (its “deviation”).
²squared — multiply by itself. This makes every deviation positive, so values below the mean don’t cancel out values above it.
nthe number of values.

In plain steps: (1) find the mean; (2) for each value, find how far it is from the mean and square that; (3) add all those squares up; (4) divide by (n − 1); (5) square-root the answer. The final square root brings you back to the original units (mm, g…), which is what makes SD easy to interpret.

Why n − 1 and not n? First, what it means in practice: n − 1 is simply one less than the number of values you have — so 5 values means you divide by 4, 10 values means you divide by 9, 20 values means you divide by 19. It is not a fixed number; it changes with your sample size.

And why one less? Because in biology you almost always measure a sample, not the whole population. A sample tends to look slightly less spread out than the population it came from, so dividing by the smaller (n − 1) nudges the answer up a little and corrects for that. You do not have to prove this at A-level — just remember: for a sample, divide by (n − 1).

Standard deviation is often the better answer than range when a question asks which measure of dispersion to use, especially if there is an outlier — because SD uses all the data, one anomalous value affects it far less than it affects the range. This is a common AQA mark point.

Range vs standard deviation: advantages & disadvantages

A favourite exam question is “give one advantage of using standard deviation rather than range” (or the reverse). Both measure spread, but they do it in very different ways — here is exactly what to say.

Range

highest value − lowest value

Advantages

Very quick and simple to calculate — just two values, no formula.
Easy to understand and useful for a quick first look at how spread out data are (e.g. a pilot study).

Disadvantages

Uses only two values (the two extremes) and ignores every value in between.
Heavily distorted by a single outlier or anomalous result — one freak value changes the range completely.
Tells you nothing about how the data are distributed around the mean.
Not used in further statistical tests.

Standard deviation

average distance of values from the mean

Advantages

Uses every value in the data set, so it describes the spread far more fully.
Much less affected by a single outlier than the range is.
Links directly to the normal distribution (the 68–95–99.7 rule) and to error bars, letting you judge whether means really differ.
Feeds into significance tests such as the t-test.

Disadvantages

Longer and harder to calculate — needs the mean, then every deviation, squared and summed.
Less meaningful if the data are strongly skewed (it assumes a roughly symmetrical spread).
Still affected by extreme values to some degree, because they are squared in the formula.

The one-mark answer

“An advantage of standard deviation over range is that it uses all the data / all the values, so it is less affected by a single anomalous (outlying) result and gives a fuller picture of the spread.” Conversely, an advantage of the range is simply that it is quicker and easier to calculate.

The normal distribution & the 68–95–99.7 rule

Most continuous biological variables (height, mass, leaf length) follow a normal distribution — a symmetrical, bell-shaped curve: most values sit near the middle, and fewer and fewer values appear as you move out towards the extremes. (You can see this shape in the interactive explorer below.) Once data are normal, the standard deviation carves the curve into predictable slices:

About 68% of values lie within ±1 SD of the mean.
About 95% of values lie within ±2 SD of the mean.
About 99.7% of values lie within ±3 SD of the mean.

The 68–95–99.7 rule. Green = within 1 SD of the mean (68% of values); amber = the next SD out either side (another 27%); red = beyond 2 SD (the rare ~5% in the tails).

This is why you will see “mean ± 2 standard deviations” used as an approximate 95% range. A value out beyond 2 SD from the mean is in the rare 5% tail — it is unusual, and that “unusual” idea is exactly what every significance test (chi-squared, t-test) is built on.

What “2 standard deviations” actually means

Because “2 SD” is used so much, it is worth making it completely concrete. Suppose you measured 20 leaves with a mean of 50 mm and a standard deviation of 8 mm. Then:

1 SD = 8 mm, so 2 SD = 16 mm.
Mean − 2 SD = 50 − 16 = 34 mm (the bottom of the range).
Mean + 2 SD = 50 + 16 = 66 mm (the top of the range).

So the “±2 SD range” here is simply 34 mm to 66 mm. The 68–95–99.7 rule says that about 95% of the leaves should fall inside that range, and only about 1 in 20 should fall outside it. In the diagram below, 19 of the 20 leaves land inside the green band — you can count them — and just one (the unusually long leaf) falls outside in the rare 5%.

Twenty leaf lengths (mean 50 mm, SD 8 mm). The green band runs from −2 SD (34 mm) to +2 SD (66 mm). 19 of the 20 leaves fall inside it (~95%); the one red leaf outside is in the rare ~5%.

Significance & chance — what they really mean

This is the most important idea on the whole page. Every statistical test — chi-squared, the t-test, correlation — and the null hypothesis are all built on it. We will go very slowly and assume you have never met any of it before.

Two ordinary words are used in a special way in statistics: “by chance” and “significant”. We will build them from three everyday examples, then use them on a real experiment.

First, three examples of “chance”

1. A coin

A fair coin means an ordinary, honest coin — one that is equally likely to land on heads or tails, with nothing done to it to favour either side. (An unfair or “biased” coin would be weighted so it lands one way more often.)

Flip a fair coin 10 times. You expect 5 heads — but you will very often not get exactly 5. You might get 4, or 6, or 7. That does not mean the coin is faulty. It is just the ordinary random variation you get whenever chance is involved. That everyday variation is what statisticians mean by “by chance”.

2. A medicine

Two people have a cold. One takes a vitamin and feels better after 5 days; the other takes nothing and feels better after 6 days. Did the vitamin work? You cannot tell — people shake off colds in different times anyway, so a one-day difference could easily happen by chance, with the vitamin doing nothing at all. This is the exact problem every experiment faces: is the difference I see real, or just natural variation?

3. Two dice — and this one explains the bell shape

Here is the example that shows why random results form that famous bell shape. Roll two dice and add them. The total can be anywhere from 2 to 12 — but the totals are not equally likely. Why not? Because some totals can be made in many different ways and others in only one:

A total of 2 can only happen one way: 1 + 1.
A total of 12 can only happen one way: 6 + 6.
But a total of 7 can happen six ways: 1+6, 2+5, 3+4, 4+3, 5+2, 6+1.

So a 7 comes up far more often than a 2 or a 12 — simply because there are more ways to make it. Count the ways for every total and you get this:

Each total is shown with the actual dice combinations that make it. A 7 can be made six ways, so its bar is tallest and a 7 comes up most often; a 2 or a 12 can be made only one way, so they are rare. This is why chance piles results into the middle and thins them at the edges — that shape is the bell curve.

This is why a bell curve looks the way it does

There are lots of ways to get a middle-ish, ordinary result, but only a few ways to get an extreme one. So ordinary results pile up into the tall middle of the curve, and extreme results are pushed out into the thin tails. Every bell-shaped distribution — dice totals, heights of people, leaf lengths — is tall in the middle and low at the edges for exactly this reason.

Watch it happen: the Galton board

A Galton board makes this idea physical. Drop a bead in at the top and it hits a peg. At every peg it bounces left or right at random (a 50–50 choice, like flipping a coin). After bouncing through all the rows, it drops into a bin at the bottom. Drop hundreds of beads and watch what shape they build.

Interactive

Drop the beads and build a bell curve

Real gravity: each bead falls, bounces left or right off every peg at random, and settles into a bin. Pour in hundreds and watch a bell curve build itself in front of you.

Beads 400 Speed 3× In bins: 0

This is the same lesson as the dice, made visible. There are many left/right paths that end in the middle bins, so they fill up fast and tall — but only one path reaches each far edge (all-left, or all-right), so the edges stay low. That is why random beads — and random biological measurements like height or leaf length — naturally settle into a bell curve.

How much variation is “normal”? Back to the coin

The coin behaves just like the dice. There are many ways to end up with about 5 heads, but very few ways to get an extreme result like 10 heads — so middle results are common and extreme ones are rare. The chart below shows how often each number of heads comes up, which lets us answer the key question: where does an “ordinary” result end and a “rare” one begin?

Getting 4, 5 or 6 heads is common (green — about 66% of the time). Getting 8, 9 or 10 heads is rare (red — only about 5% of the time). Both are still “just chance” — but the red results almost never happen by luck.

The suspicion ladder (this is the key idea)

Forget formulas for a moment. Imagine a friend keeps flipping a coin and telling you how many heads they got out of 10. At what point would you start to suspect the coin is rigged? Your gut already knows the answer — let’s follow it down this ladder:

They got…	Your gut reaction	Chance of getting this (or more) by luck	Verdict
5 heads	“Yeah, that’s normal.”	very likely by luck	Not suspicious
6 heads	“Fine, still normal.”	38% by luck	Not suspicious
7 heads	“Hmm, a bit lucky.”	17% by luck	Mildly curious
8 heads	“OK, that’s getting odd…”	5.5% by luck	Suspicious
9 heads	“Something’s off here.”	1.1% by luck	Very suspicious
10 heads	“That coin is rigged!”	0.1% by luck	Convinced it’s real

Look at what happened as you went down the ladder: the more heads they got, the less likely luck could explain it (the % on the right shrinks), and the more suspicious you became. Those two things always move together — and this is the bit that stops the logic feeling backwards:

The one sentence to hold onto

The smaller the chance that luck did it, the more you believe something real did it. A big % (like 38%) means “luck explains this easily” — not suspicious. A tiny % (like 1%) means “luck almost never does this” — very suspicious. So a small probability is the surprising, meaningful result. That feels backwards only until you picture the ladder: fewer excuses for luck = more convinced it’s real.

Scientists just draw a line on this ladder and agree: “once luck’s chance drops below 5%, I’m suspicious enough to stop believing it was luck.” On the coin that line falls at about 8 heads. Getting 8, 9 or 10 heads is where you cross from “just chance” into “something real”. Making that switch is the entire job of a statistical test.

Now the biology: the fertiliser experiment

You test a fertiliser by growing two groups of plants and measuring their heights:

The result

Fertilised plants: mean height 42 cm. Unfertilised plants: mean height 39 cm. A difference of 3 cm.

The fertilised plants are taller, so the fertiliser works — right? Not necessarily. Plants vary in height even when treated identically (just like the coin does not give exactly 5, and the two cold-sufferers recovered in different times). If you had taken untreated plants and simply split them into two groups at random, the two means would still differ by a centimetre or two — purely by chance. So the real question is:

The one question every statistical test answers

Is the 3 cm difference bigger than the variation I would expect between groups anyway? If a 3 cm gap turns up easily from natural variation, the fertiliser may be doing nothing. If a 3 cm gap almost never turns up from variation alone, then variation is a poor explanation — and the fertiliser is probably having a real effect.

What “significant” means — and the 5% line

A result is called significant when it is so unlikely to have happened by chance that “chance” is no longer a believable explanation, so you conclude something real is going on. That is the whole idea — it is the coin’s “9 heads out of 10” moment, applied to an experiment.

We just need an agreed cut-off for “so unlikely”. Scientists use 5%, written p < 0.05:

p < 0.05 in plain words

“If chance alone were really behind this, I would see a result this extreme less than 5 times in every 100. That is rare enough that I stop believing it is chance — so I call it significant.” (That is the same 5% as getting 8+ heads on the coin — a level of rareness you can now picture.)

The direction of this trips everyone up, so hold onto it:

A big probability it’s chance

e.g. p = 0.30 (30 in 100)

Chance easily explains the result — you would see it happen often anyway.
NOT significant. No evidence of a real effect.

A tiny probability it’s chance

e.g. p = 0.01 (1 in 100)

Chance is a poor explanation — this hardly ever happens by luck.
Significant. Something real is probably going on.

The common mistake is “5% is a small number, so being in the 5% must be good/safe”. It is the opposite. A small probability means chance cannot explain your result — that is the surprising, significant outcome. Small p = significant.

What “significant” does NOT mean

It does not mean “big” or “important”. It only means “unlikely to be chance”. A tiny 0.1 cm difference can be significant with a big enough sample, yet mean nothing biologically.
It does not prove the cause. A significant fertiliser result says chance is unlikely — it does not by itself prove the fertiliser did it. That is where your biology comes in.
“Not significant” does not prove there is no effect. It just means you did not find strong enough evidence — like a court saying “not guilty” rather than “proven innocent”.

Where this is heading

Every test does the same three things: (1) start by assuming it is all just chance — that starting assumption is the null hypothesis; (2) work out how likely your actual result would be if that were true — the p-value; (3) if that probability falls below 5%, reject “just chance” and call the result significant. You will use this exact logic on the chi-squared and t-test pages.

Standard error & error bars

Standard deviation, which you met above, tells you how spread out your individual measurements are. But there is a second, different question that matters just as much: how much can you trust the mean itself?

Here is the idea. You measured one sample of leaves and got a mean. If you went out and collected a completely new sample of the same kind of leaves, you would get a slightly different mean, just by chance. Collect a third sample — a slightly different mean again. The standard error of the mean (SE) is a number that estimates how much your mean would jump around if you kept repeating the experiment like this. A small standard error means your mean is reliable; a large one means it could easily have come out quite different.

How you work it out

Reading the symbols:

SEthe standard error of the mean — the answer.
sthe standard deviation you already worked out above.
√nthe square root of n, where n is the number of measurements in the sample.

So you simply take the standard deviation and divide it by the square root of your sample size. Because you are dividing, the standard error is always a smaller number than the standard deviation. And because n is on the bottom, the more measurements you take, the smaller the standard error becomes — which is just the common-sense idea that a bigger sample gives a more trustworthy mean, written as maths.

What is an error bar?

When you draw a bar chart or point graph of your means, each mean is a single dot or the top of a bar. An error bar is a small vertical line drawn through that point: it goes up a certain amount and down the same amount, with a little cap at each end, to show how much uncertainty or variation there is around the mean. It is a way of saying “the mean is here, but the true value is somewhere in this range.”

The anatomy of an error bar: the mean sits in the middle; the whisker reaches the same distance up and down, capped at each end. A longer bar means more uncertainty or variation.

You get to choose how long to make the error bars, and you must state which you have used in the graph title or legend, because the two choices mean different things:

Error bars of ±1 standard deviation — use these when you want to show how variable the raw measurements are.
Error bars of ±2 standard error — use these when you want to compare two means and see whether they are really different. Because ±2 SE marks out the ~95% range (see the box above), this is the version that lets you judge overlap, and the one that matters most at A-level. (You will sometimes also see ±1 SE bars, but the overlap rule below is most reliable with ±2 SE.)

Same mean and same data, drawn both ways. SD bars are long because they show how spread the raw measurements are. ±2 SE bars are much shorter (SE = SD ÷ √n) because they show how reliable the mean is — and they are the ones you compare between two means.

Why “±2”? (this is the bit that ties it all together)

Look back at the 68–95–99.7 rule above: about 95% of data lie within 2 of these units of the mean. So when you draw an error bar that reaches 2 standard errors up and 2 down, that bar marks out roughly the 95% range for where the true mean is likely to be.

Being “95% confident” simply means this: if the pattern you are seeing were really just chance, you would only expect to be fooled about 5 times in every 100 (5%). Scientists have agreed that a 1-in-20 chance of being wrong is a low enough risk to act on — so 95% is the standard cut-off for saying you are “confident” a result is real.

That is why you keep seeing ±2: it is not a random choice, it is the length that captures 95% of the likely values (leaving the rare 5% outside). A shorter ±1 bar only captures about 68%, which would leave a much bigger 32% chance of being wrong — too weak to be confident about.

What “significant” means

Throughout statistics you will see the word significant. It does not mean “big” or “important”. It means: the difference is very unlikely to be just chance — something real is probably causing it. When two means are “significantly different”, you are saying you are confident the difference is genuine, not an accident of which leaves you happened to pick.

Reading standard-error bars — the actual exam skill

Draw the two means with their ±2 SE error bars side by side. If the error bars overlap (touch or cross), the difference between the means is probably just chance — not significant. If the error bars do not overlap (there is a clear gap), the difference might be real (significant) — but you do not stop there: you confirm it with a proper statistical test such as a t-test. In short: error bars suggest, a test decides. You will do exactly this in the interactive below.

Interactive: the spread explorer

Now that you have met the mean, standard deviation, the normal curve and standard error, this interactive lets you play with all of them at once. Two samples of leaves have the same mean length, but you control how spread out each one is. Drag the sliders and watch everything respond. The big lesson: the mean on its own is only half the story.

Interactive

Spread, standard deviation & error bars

Both samples of leaves have a mean of 50 mm and n = 20 leaves each. Use the three buttons to switch what you are looking at, then drag the sliders.

Within 1 SD ≈ 68% of leaves Between 1 and 2 SD ≈ 27% Beyond 2 SD ≈ 5% (rare)

Sample A spread SD = 4 mm

Sample B spread SD = 12 mm

Mean A (mm)

SD A (mm)

Mean B (mm)

SD B (mm)

On the first two views the mean read-outs never change — both samples always average 50 mm. Everything you see moving is the spread. That is the whole point: two experiments can give the identical mean yet tell completely different stories, and the standard deviation is what tells them apart.

Worked example

A typical exam-style question that tests interpretation rather than raw calculation. Click to expand the full method.

Do two antimicrobials differ? Using SD to judge overlap

Statistics · interpretation Standard deviation3 marks

Clear-zone diameters were measured for two antimicrobials. Cinnamon oil gave a mean of 17 mm (SD 2.4); a positive control gave a mean of 13 mm (SD 2.2), both with n = 10. Given that mean ± 2 SD includes over 95% of the data, evaluate whether the difference between the two means is likely to be due to chance.

What the question gives you

Cinnamon oil: mean 17 mm, SD 2.4. ±2 SD range = 17 ± 4.8 = 12.2 to 21.8 mm.
Positive control: mean 13 mm, SD 2.2. ±2 SD range = 13 ± 4.4 = 8.6 to 17.4 mm.

The method

Work out each ±2 SD range (95% of the data), as above.
Check whether the ranges overlap. Cinnamon oil runs 12.2–21.8; the control runs 8.6–17.4. They overlap between 12.2 and 17.4 mm.
Interpret the overlap. Because the 95% ranges overlap, some cinnamon-oil results and some control results could plausibly come from the same underlying value, so the difference in means could be due to chance.

Answer: The ±2 SD ranges overlap (12.2–17.4 mm), so we cannot be confident the difference between the means is real — it may be due to chance. To decide properly you would carry out a t-test and compare the difference against the 0.05 significance level.

Say the difference may be due to chance, not “the results are due to chance”. And notice the logic is identical to reading error bars: overlap → not confident; no overlap → possibly significant, so test it.

Check your understanding

Three quick self-marking activities to test the ideas on this page. Each one tells you the answer and explains it — use them to spot anything you need to re-read.

What each exam board expects

All the main A-level Biology specifications require descriptive statistics. You will not calculate SD by hand in a written paper, but you must understand and interpret it.

Board	What is required
AQA (7402)	MS 1.10 — understand measures of dispersion including standard deviation and range; know why SD may be more useful (e.g. with an outlier).
OCR A / B	Standard deviation appears on the “choosing a test” flowchart; understand spread and error bars alongside the four named tests.
Edexcel A / B	Calculate and interpret mean, SD and use error bars to show variability and compare means.
WJEC / Eduqas	Mean, standard deviation, standard error and the use of error bars to judge overlap between means.

← Back toStatistics hub & how to choose a test ToolA-level Biology calculator (SD, mean & more) Next →The chi-squared test