Descriptive statistics: mean, standard deviation & standard error
Before you run any statistical test, you describe your data. This page explains the mean, the standard deviation, the standard error and error bars — what each one tells you, and how to read them the way an examiner expects. Drag the interactive below to see spread change in real time.
Living things vary. Measure the height of 100 people, the length of 100 leaves or the mass of 100 woodlice and no two values will be identical. Descriptive statistics are the numbers that sum up that variation so you can see what your data are actually telling you: where the values cluster (the mean), and how spread out they are (the standard deviation).
You will not usually be asked to calculate a standard deviation by hand in a written exam — you use a calculator or spreadsheet for that. What you are examined on is understanding what these numbers mean, choosing the right one, and interpreting them, especially error bars. That is what this page focuses on.
The two things every set of data has
A centre — a value the data cluster around (mean, median or mode). A spread — how far the data stretch out from that centre (range, standard deviation). Almost all of biological statistics is comparing centres while taking the spread into account.
Measures of the centre: mean, median & mode
There are three ways to describe the “middle” of a set of data. You need to know what each is and when it is the sensible choice.
- Mean (the average) — add up all the values and divide by how many there are. This is by far the most used measure in biology and the one that feeds into statistical tests.Reading the symbols:
- x̄the mean (said “x-bar”) — the answer you are working out.
- Σ“the sum of” — the Greek capital letter sigma. It just says “add up everything that follows”. (Handy memory: sigma and sum both start with s.)
- xeach individual value (each measurement you took).
- nthe number of values in your sample (how many measurements). n for number.
- Median — put the values in order and take the middle one (or the average of the middle two). Useful when there are extreme values (outliers) that would drag the mean, because the median is barely affected by them.
- Mode — the most common value. The only measure you can use for data that are categories (e.g. eye colour). On a graph it is the tallest bar or peak.
Which one should you use? (a common exam question)
You are often asked which measure of central tendency is most appropriate for a given set of data, and why. There is no single “best” one — it depends on the data. Here is how to choose.
Use the mean when…
- the data are continuous (measured, e.g. height, mass, length) and roughly symmetrical with no big outliers.
- it uses every value, so it is the most representative measure — and it is the one needed for standard deviation and statistical tests such as the t-test.
- Avoid it when there is an outlier or the data are skewed, because one extreme value drags the mean and it stops representing a “typical” value.
Use the median when…
- the data are skewed or contain anomalous/outlying values — the median is barely affected by extremes, so it still represents a typical value.
- works for continuous or ordered (ranked) data.
- it ignores most of the values (only the middle position matters), and it is not used in the standard significance tests.
Use the mode when…
- the data are categories / discrete groups (e.g. eye colour, blood group, phenotype) — here a mean or median makes no sense, so the mode is the only measure you can use.
- you want the single most common value quickly.
- it can be misleading or undefined for continuous data (there may be no repeated value, or more than one mode).
How to answer it in the exam
Name the measure and justify it from the data. For example: “The median is most appropriate because the data contain an outlier, which would distort the mean.” Or: “The mode is most appropriate because the data are categories (blood groups), so a mean cannot be calculated.” The mark is almost always for the reason, not just the name.
In a perfectly symmetrical (normal) distribution the mean, median and mode all sit at the same point, right at the peak. As soon as the distribution becomes skewed (lopsided, with a long tail on one side), the three measures pull apart — and the order they fall in tells you which way the data are skewed. The diagram below shows all three cases at a glance:
Positive skew (right-skewed)
A few unusually large values stretch the distribution out into a long tail on the right. Those extreme values pull the mean towards them, so the mean ends up the highest of the three:
positive skew: mode < median < meanThe mode stays put at the peak (the most common value), the mean is dragged furthest towards the tail, and the median sits in between. Biological example: the number of offspring per individual, where most have few but a handful have very many.
Negative skew (left-skewed)
A few unusually small values stretch the distribution into a long tail on the left. Now the mean is pulled downwards, so the order reverses:
negative skew: mean < median < modeA useful memory hook: the mean always chases the tail. Whichever side the long tail is on, that is the side the mean shifts towards — because the mean is the only one of the three affected by how far the extreme values are, not just how many. The median is barely moved and the mode not at all, which is exactly why the median is the safer measure of centre when data are skewed or contain outliers.
See it move
In the interactive explorer further down this page, switch to the “Skew: mean vs median vs mode” view. Drag the skew slider from negative through symmetrical to positive and watch the three markers separate and swap order — the mean (orange) always heads into the tail first.
Worked calculation: watch the mean, median & mode change
Mean / median / mode
A student counts the number of seeds per pod in nine pea pods:
4, 5, 5, 6, 6, 6, 7, 7, 8
Calculate the mean, median and mode. Then a tenth pod is found with 28 seeds (an unusually large pod). Recalculate all three and comment.
Step 1 — the original nine pods
The values are already in order, which makes the median easy.
- Mean = sum ÷ n: (4+5+5+6+6+6+7+7+8) ÷ 9 = 54 ÷ 9 = 6.0 seeds
- Median = the middle value. With 9 values the middle is the 5th: 4, 5, 5, 6, [6], 6, 7, 7, 8 → median = 6 seeds
- Mode = most common value. Six appears three times: mode = 6 seeds
Step 2 — add the outlier (the 28-seed pod)
The data set is now: 4, 5, 5, 6, 6, 6, 7, 7, 8, 28 (n = 10).
- New mean: (54 + 28) ÷ 10 = 82 ÷ 10 = 8.2 seeds Up from 6.0 to 8.2 — a big jump, and 8.2 is higher than all but one of the pods, so it no longer represents a “typical” pod.
- New median = average of the two middle values (5th and 6th of 10): 4, 5, 5, 6, [6, 6], 7, 7, 8, 28 → (6+6) ÷ 2 = 6 seeds Unchanged at 6.
- New mode = still 6 (appears three times): mode = 6 seeds Unchanged.
Step 3 — what changed
| Measure | Before | After outlier |
|---|---|---|
| Mean | 6.0 | 8.2 (jumped up) |
| Median | 6 | 6 (no change) |
| Mode | 6 | 6 (no change) |
Standard deviation: measuring the spread
The simplest measure of spread is the range (highest value − lowest value). It is quick, but it uses only two values, so a single freak result can distort it completely. The standard deviation (SD) is better because it uses every value.
Standard deviation answers one question: on average, how far is each value from the mean? A small SD means the data are packed tightly around the mean; a large SD means they are widely spread. You will be able to see this directly in the interactive explorer further down the page.
The formula (given to you in the exam)
Reading the symbols:
- sthe standard deviation — the answer.
- √square root — do everything inside the brackets first, then square-root the result at the very end.
- Σ“the sum of” — add up everything that follows (once you’ve done it for every value).
- xeach individual value.
- x̄the mean of all the values (you work this out first).
- (x−x̄)how far one value is from the mean (its “deviation”).
- ²squared — multiply by itself. This makes every deviation positive, so values below the mean don’t cancel out values above it.
- nthe number of values.
In plain steps: (1) find the mean; (2) for each value, find how far it is from the mean and square that; (3) add all those squares up; (4) divide by (n − 1); (5) square-root the answer. The final square root brings you back to the original units (mm, g…), which is what makes SD easy to interpret.
Why n − 1 and not n? First, what it means in practice: n − 1 is simply one less than the number of values you have — so 5 values means you divide by 4, 10 values means you divide by 9, 20 values means you divide by 19. It is not a fixed number; it changes with your sample size.
And why one less? Because in biology you almost always measure a sample, not the whole population. A sample tends to look slightly less spread out than the population it came from, so dividing by the smaller (n − 1) nudges the answer up a little and corrects for that. You do not have to prove this at A-level — just remember: for a sample, divide by (n − 1).
Range vs standard deviation: advantages & disadvantages
A favourite exam question is “give one advantage of using standard deviation rather than range” (or the reverse). Both measure spread, but they do it in very different ways — here is exactly what to say.
Range
Advantages
- Very quick and simple to calculate — just two values, no formula.
- Easy to understand and useful for a quick first look at how spread out data are (e.g. a pilot study).
Disadvantages
- Uses only two values (the two extremes) and ignores every value in between.
- Heavily distorted by a single outlier or anomalous result — one freak value changes the range completely.
- Tells you nothing about how the data are distributed around the mean.
- Not used in further statistical tests.
Standard deviation
Advantages
- Uses every value in the data set, so it describes the spread far more fully.
- Much less affected by a single outlier than the range is.
- Links directly to the normal distribution (the 68–95–99.7 rule) and to error bars, letting you judge whether means really differ.
- Feeds into significance tests such as the t-test.
Disadvantages
- Longer and harder to calculate — needs the mean, then every deviation, squared and summed.
- Less meaningful if the data are strongly skewed (it assumes a roughly symmetrical spread).
- Still affected by extreme values to some degree, because they are squared in the formula.
The one-mark answer
“An advantage of standard deviation over range is that it uses all the data / all the values, so it is less affected by a single anomalous (outlying) result and gives a fuller picture of the spread.” Conversely, an advantage of the range is simply that it is quicker and easier to calculate.
The normal distribution & the 68–95–99.7 rule
Most continuous biological variables (height, mass, leaf length) follow a normal distribution — a symmetrical, bell-shaped curve: most values sit near the middle, and fewer and fewer values appear as you move out towards the extremes. (You can see this shape in the interactive explorer below.) Once data are normal, the standard deviation carves the curve into predictable slices:
- About 68% of values lie within ±1 SD of the mean.
- About 95% of values lie within ±2 SD of the mean.
- About 99.7% of values lie within ±3 SD of the mean.
This is why you will see “mean ± 2 standard deviations” used as an approximate 95% range. A value out beyond 2 SD from the mean is in the rare 5% tail — it is unusual, and that “unusual” idea is exactly what every significance test (chi-squared, t-test) is built on.
What “2 standard deviations” actually means
Because “2 SD” is used so much, it is worth making it completely concrete. Suppose you measured 20 leaves with a mean of 50 mm and a standard deviation of 8 mm. Then:
- 1 SD = 8 mm, so 2 SD = 16 mm.
- Mean − 2 SD = 50 − 16 = 34 mm (the bottom of the range).
- Mean + 2 SD = 50 + 16 = 66 mm (the top of the range).
So the “±2 SD range” here is simply 34 mm to 66 mm. The 68–95–99.7 rule says that about 95% of the leaves should fall inside that range, and only about 1 in 20 should fall outside it. In the diagram below, 19 of the 20 leaves land inside the green band — you can count them — and just one (the unusually long leaf) falls outside in the rare 5%.
Significance & chance — what they really mean
This is the most important idea on the whole page. Every statistical test — chi-squared, the t-test, correlation — and the null hypothesis are all built on it. We will go very slowly and assume you have never met any of it before.
Two ordinary words are used in a special way in statistics: “by chance” and “significant”. We will build them from three everyday examples, then use them on a real experiment.
First, three examples of “chance”
1. A coin
A fair coin means an ordinary, honest coin — one that is equally likely to land on heads or tails, with nothing done to it to favour either side. (An unfair or “biased” coin would be weighted so it lands one way more often.)
Flip a fair coin 10 times. You expect 5 heads — but you will very often not get exactly 5. You might get 4, or 6, or 7. That does not mean the coin is faulty. It is just the ordinary random variation you get whenever chance is involved. That everyday variation is what statisticians mean by “by chance”.
2. A medicine
Two people have a cold. One takes a vitamin and feels better after 5 days; the other takes nothing and feels better after 6 days. Did the vitamin work? You cannot tell — people shake off colds in different times anyway, so a one-day difference could easily happen by chance, with the vitamin doing nothing at all. This is the exact problem every experiment faces: is the difference I see real, or just natural variation?
3. Two dice — and this one explains the bell shape
Here is the example that shows why random results form that famous bell shape. Roll two dice and add them. The total can be anywhere from 2 to 12 — but the totals are not equally likely. Why not? Because some totals can be made in many different ways and others in only one:
- A total of 2 can only happen one way: 1 + 1.
- A total of 12 can only happen one way: 6 + 6.
- But a total of 7 can happen six ways: 1+6, 2+5, 3+4, 4+3, 5+2, 6+1.
So a 7 comes up far more often than a 2 or a 12 — simply because there are more ways to make it. Count the ways for every total and you get this:
This is why a bell curve looks the way it does
There are lots of ways to get a middle-ish, ordinary result, but only a few ways to get an extreme one. So ordinary results pile up into the tall middle of the curve, and extreme results are pushed out into the thin tails. Every bell-shaped distribution — dice totals, heights of people, leaf lengths — is tall in the middle and low at the edges for exactly this reason.
Watch it happen: the Galton board
A Galton board makes this idea physical. Drop a bead in at the top and it hits a peg. At every peg it bounces left or right at random (a 50–50 choice, like flipping a coin). After bouncing through all the rows, it drops into a bin at the bottom. Drop hundreds of beads and watch what shape they build.
Drop the beads and build a bell curve
Real gravity: each bead falls, bounces left or right off every peg at random, and settles into a bin. Pour in hundreds and watch a bell curve build itself in front of you.
How much variation is “normal”? Back to the coin
The coin behaves just like the dice. There are many ways to end up with about 5 heads, but very few ways to get an extreme result like 10 heads — so middle results are common and extreme ones are rare. The chart below shows how often each number of heads comes up, which lets us answer the key question: where does an “ordinary” result end and a “rare” one begin?
The suspicion ladder (this is the key idea)
Forget formulas for a moment. Imagine a friend keeps flipping a coin and telling you how many heads they got out of 10. At what point would you start to suspect the coin is rigged? Your gut already knows the answer — let’s follow it down this ladder:
| They got… | Your gut reaction | Chance of getting this (or more) by luck | Verdict |
|---|---|---|---|
| 5 heads | “Yeah, that’s normal.” | very likely by luck | Not suspicious |
| 6 heads | “Fine, still normal.” | 38% by luck | Not suspicious |
| 7 heads | “Hmm, a bit lucky.” | 17% by luck | Mildly curious |
| 8 heads | “OK, that’s getting odd…” | 5.5% by luck | Suspicious |
| 9 heads | “Something’s off here.” | 1.1% by luck | Very suspicious |
| 10 heads | “That coin is rigged!” | 0.1% by luck | Convinced it’s real |
Look at what happened as you went down the ladder: the more heads they got, the less likely luck could explain it (the % on the right shrinks), and the more suspicious you became. Those two things always move together — and this is the bit that stops the logic feeling backwards:
The one sentence to hold onto
The smaller the chance that luck did it, the more you believe something real did it. A big % (like 38%) means “luck explains this easily” — not suspicious. A tiny % (like 1%) means “luck almost never does this” — very suspicious. So a small probability is the surprising, meaningful result. That feels backwards only until you picture the ladder: fewer excuses for luck = more convinced it’s real.
Scientists just draw a line on this ladder and agree: “once luck’s chance drops below 5%, I’m suspicious enough to stop believing it was luck.” On the coin that line falls at about 8 heads. Getting 8, 9 or 10 heads is where you cross from “just chance” into “something real”. Making that switch is the entire job of a statistical test.
Now the biology: the fertiliser experiment
You test a fertiliser by growing two groups of plants and measuring their heights:
The result
Fertilised plants: mean height 42 cm. Unfertilised plants: mean height 39 cm. A difference of 3 cm.
The fertilised plants are taller, so the fertiliser works — right? Not necessarily. Plants vary in height even when treated identically (just like the coin does not give exactly 5, and the two cold-sufferers recovered in different times). If you had taken untreated plants and simply split them into two groups at random, the two means would still differ by a centimetre or two — purely by chance. So the real question is:
The one question every statistical test answers
Is the 3 cm difference bigger than the variation I would expect between groups anyway? If a 3 cm gap turns up easily from natural variation, the fertiliser may be doing nothing. If a 3 cm gap almost never turns up from variation alone, then variation is a poor explanation — and the fertiliser is probably having a real effect.
What “significant” means — and the 5% line
A result is called significant when it is so unlikely to have happened by chance that “chance” is no longer a believable explanation, so you conclude something real is going on. That is the whole idea — it is the coin’s “9 heads out of 10” moment, applied to an experiment.
We just need an agreed cut-off for “so unlikely”. Scientists use 5%, written p < 0.05:
p < 0.05 in plain words
“If chance alone were really behind this, I would see a result this extreme less than 5 times in every 100. That is rare enough that I stop believing it is chance — so I call it significant.” (That is the same 5% as getting 8+ heads on the coin — a level of rareness you can now picture.)
The direction of this trips everyone up, so hold onto it:
A big probability it’s chance
- Chance easily explains the result — you would see it happen often anyway.
- NOT significant. No evidence of a real effect.
A tiny probability it’s chance
- Chance is a poor explanation — this hardly ever happens by luck.
- Significant. Something real is probably going on.
What “significant” does NOT mean
- It does not mean “big” or “important”. It only means “unlikely to be chance”. A tiny 0.1 cm difference can be significant with a big enough sample, yet mean nothing biologically.
- It does not prove the cause. A significant fertiliser result says chance is unlikely — it does not by itself prove the fertiliser did it. That is where your biology comes in.
- “Not significant” does not prove there is no effect. It just means you did not find strong enough evidence — like a court saying “not guilty” rather than “proven innocent”.
Where this is heading
Every test does the same three things: (1) start by assuming it is all just chance — that starting assumption is the null hypothesis; (2) work out how likely your actual result would be if that were true — the p-value; (3) if that probability falls below 5%, reject “just chance” and call the result significant. You will use this exact logic on the chi-squared and t-test pages.
Standard error & error bars
Standard deviation, which you met above, tells you how spread out your individual measurements are. But there is a second, different question that matters just as much: how much can you trust the mean itself?
Here is the idea. You measured one sample of leaves and got a mean. If you went out and collected a completely new sample of the same kind of leaves, you would get a slightly different mean, just by chance. Collect a third sample — a slightly different mean again. The standard error of the mean (SE) is a number that estimates how much your mean would jump around if you kept repeating the experiment like this. A small standard error means your mean is reliable; a large one means it could easily have come out quite different.
How you work it out
Reading the symbols:
- SEthe standard error of the mean — the answer.
- sthe standard deviation you already worked out above.
- √nthe square root of n, where n is the number of measurements in the sample.
So you simply take the standard deviation and divide it by the square root of your sample size. Because you are dividing, the standard error is always a smaller number than the standard deviation. And because n is on the bottom, the more measurements you take, the smaller the standard error becomes — which is just the common-sense idea that a bigger sample gives a more trustworthy mean, written as maths.
What is an error bar?
When you draw a bar chart or point graph of your means, each mean is a single dot or the top of a bar. An error bar is a small vertical line drawn through that point: it goes up a certain amount and down the same amount, with a little cap at each end, to show how much uncertainty or variation there is around the mean. It is a way of saying “the mean is here, but the true value is somewhere in this range.”
You get to choose how long to make the error bars, and you must state which you have used in the graph title or legend, because the two choices mean different things:
- Error bars of ±1 standard deviation — use these when you want to show how variable the raw measurements are.
- Error bars of ±2 standard error — use these when you want to compare two means and see whether they are really different. Because ±2 SE marks out the ~95% range (see the box above), this is the version that lets you judge overlap, and the one that matters most at A-level. (You will sometimes also see ±1 SE bars, but the overlap rule below is most reliable with ±2 SE.)
Why “±2”? (this is the bit that ties it all together)
Look back at the 68–95–99.7 rule above: about 95% of data lie within 2 of these units of the mean. So when you draw an error bar that reaches 2 standard errors up and 2 down, that bar marks out roughly the 95% range for where the true mean is likely to be.
Being “95% confident” simply means this: if the pattern you are seeing were really just chance, you would only expect to be fooled about 5 times in every 100 (5%). Scientists have agreed that a 1-in-20 chance of being wrong is a low enough risk to act on — so 95% is the standard cut-off for saying you are “confident” a result is real.
That is why you keep seeing ±2: it is not a random choice, it is the length that captures 95% of the likely values (leaving the rare 5% outside). A shorter ±1 bar only captures about 68%, which would leave a much bigger 32% chance of being wrong — too weak to be confident about.
What “significant” means
Throughout statistics you will see the word significant. It does not mean “big” or “important”. It means: the difference is very unlikely to be just chance — something real is probably causing it. When two means are “significantly different”, you are saying you are confident the difference is genuine, not an accident of which leaves you happened to pick.
Reading standard-error bars — the actual exam skill
Draw the two means with their ±2 SE error bars side by side. If the error bars overlap (touch or cross), the difference between the means is probably just chance — not significant. If the error bars do not overlap (there is a clear gap), the difference might be real (significant) — but you do not stop there: you confirm it with a proper statistical test such as a t-test. In short: error bars suggest, a test decides. You will do exactly this in the interactive below.
Interactive: the spread explorer
Now that you have met the mean, standard deviation, the normal curve and standard error, this interactive lets you play with all of them at once. Two samples of leaves have the same mean length, but you control how spread out each one is. Drag the sliders and watch everything respond. The big lesson: the mean on its own is only half the story.
Spread, standard deviation & error bars
Both samples of leaves have a mean of 50 mm and n = 20 leaves each. Use the three buttons to switch what you are looking at, then drag the sliders.
Worked example
A typical exam-style question that tests interpretation rather than raw calculation. Click to expand the full method.
Do two antimicrobials differ? Using SD to judge overlap
Standard deviation3 marks
Clear-zone diameters were measured for two antimicrobials. Cinnamon oil gave a mean of 17 mm (SD 2.4); a positive control gave a mean of 13 mm (SD 2.2), both with n = 10. Given that mean ± 2 SD includes over 95% of the data, evaluate whether the difference between the two means is likely to be due to chance.
What the question gives you
- Cinnamon oil: mean 17 mm, SD 2.4. ±2 SD range = 17 ± 4.8 = 12.2 to 21.8 mm.
- Positive control: mean 13 mm, SD 2.2. ±2 SD range = 13 ± 4.4 = 8.6 to 17.4 mm.
The method
- Work out each ±2 SD range (95% of the data), as above.
- Check whether the ranges overlap. Cinnamon oil runs 12.2–21.8; the control runs 8.6–17.4. They overlap between 12.2 and 17.4 mm.
- Interpret the overlap. Because the 95% ranges overlap, some cinnamon-oil results and some control results could plausibly come from the same underlying value, so the difference in means could be due to chance.
Check your understanding
Three quick self-marking activities to test the ideas on this page. Each one tells you the answer and explains it — use them to spot anything you need to re-read.
What each exam board expects
All the main A-level Biology specifications require descriptive statistics. You will not calculate SD by hand in a written paper, but you must understand and interpret it.
| Board | What is required |
|---|---|
| AQA (7402) | MS 1.10 — understand measures of dispersion including standard deviation and range; know why SD may be more useful (e.g. with an outlier). |
| OCR A / B | Standard deviation appears on the “choosing a test” flowchart; understand spread and error bars alongside the four named tests. |
| Edexcel A / B | Calculate and interpret mean, SD and use error bars to show variability and compare means. |
| WJEC / Eduqas | Mean, standard deviation, standard error and the use of error bars to judge overlap between means. |
