The chi-squared test: do the counts fit?
Some data are not measurements — they are counts in categories: how many purple vs white flowers, how many of each blood group. The chi-squared (χ²) test asks whether the counts you observed match the counts you expected, or whether the difference is too big to be chance. This page builds it from scratch, with real exam-style worked examples.
When do you use a chi-squared test?
You use a chi-squared (χ²) test when your data are counts of individuals in categories — frequencies — and you want to know whether those counts differ significantly from what you expected. “Chi” is a Greek letter, said “kye” (to rhyme with “eye”), and χ² is “kye-squared”.
Use a chi-squared test when…
- your data are counts / frequencies in categories (e.g. numbers of purple vs white flowers) — whole individuals, not measurements;
- you have an expected set of counts to compare against — from a genetic ratio, a theory, or from row and column totals;
- you want to know if the difference between observed and expected is bigger than chance would give.
Do NOT use chi-squared for…
- measurements (length, mass, rate) — comparing two means is a job for the t-test;
- looking for a relationship between two measured variables — that is Spearman’s rank correlation;
- data given as percentages or proportions — chi-squared must use the actual counts, never percentages.
The idea behind the test
Suppose you cross two plants and expect the offspring in a 3:1 ratio. You almost never get exactly 3:1 — you might get 74:26 instead of 75:25. Is that small difference just chance, or is your expected ratio wrong? The chi-squared test measures how far the observed counts are from the expected counts, all together, as a single number.
- For each category it takes the difference (observed − expected), squares it (so differences never cancel out), and divides by the expected count (so a difference of 5 matters more when only 10 were expected than when 500 were).
- Adding those up gives χ². A small χ² means observed and expected are close — a good fit. A large χ² means they are far apart — the expected ratio may be wrong.
The null hypothesis (always state it first)
Every chi-squared test starts from the null hypothesis: “there is no difference between the observed and expected counts” (any difference is due to chance). The test tells you whether the evidence is strong enough to reject it. Note the wording is “no difference” — the word “significant” belongs to your conclusion, not the hypothesis.
The formula (given to you in the exam)
You are always given the formula. What matters is knowing what each symbol means and being able to build the table of working:
Reading the symbols:
- χ²the chi-squared value — the single number you are calculating.
- Σ“sigma” — add up the terms that follow, one for every category.
- Othe observed count in a category (what you actually counted).
- Ethe expected count in that category (what the ratio or theory predicts).
- (O−E)the difference between observed and expected for that category.
- (O−E)²that difference squared (so a shortfall and an excess do not cancel).
- (O−E)²⁄Ethe squared difference divided by the expected count for that category — one term per category.
In plain steps: (1) work out the expected count E for each category; (2) for each, find O−E; (3) square it; (4) divide by E; (5) add all the terms to get χ²; (6) compare χ² with the critical value.
Two ways it is used
At A-level the same formula is used in two situations. The only difference is where the expected values come from and how you count the degrees of freedom.
1. Goodness-of-fit — “do the counts fit a ratio?”
Expected values come from a predicted ratio or theory (e.g. a 3:1 or 9:3:3:1 genetic ratio, or known population proportions). This is by far the most common A-level use. Degrees of freedom = (number of categories − 1).
2. Contingency table — “are two things associated?”
Data are cross-classified by two categorical variables (e.g. treatment × recovery). Expected values are worked out from the row and column totals: E = (row total × column total) ÷ grand total. Degrees of freedom = (rows − 1) × (columns − 1).
Worked examples
Full worked examples — two real exam-style genetics questions (one where you reject the null hypothesis, one where you accept it), plus medical examples of both kinds of test. Click to expand each method.
Genetics (reject the null): purple vs white flowers
Goodness-of-fitreject H₀
White flowers were crossed with purple flowers. One hypothesis is that purple is caused by a single dominant allele, which predicts the F1 should be 1 purple : 1 white. The cross gave 32 white and 18 purple (50 plants). Do the results fit the 1:1 ratio?
Step 1 — State the null hypothesis
Null hypothesis (H₀): there is no difference between the observed numbers of purple and white flowers and the numbers expected from a 1:1 ratio.
Step 2 — Work out the expected counts
A 1:1 ratio of 50 plants means 25 expected in each category.
Step 3 — Build the table
| Phenotype | Observed O | Expected E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| White | 32 | 25 | +7 | 49 | 1.96 |
| Purple | 18 | 25 | −7 | 49 | 1.96 |
| χ² = Σ = | 3.92 | ||||
Step 4 — Degrees of freedom
df = (number of categories − 1) = 2 − 1 = 1Step 5 — Compare with the critical value
At p = 0.05 with 1 degree of freedom the critical value is 3.84.
χ² = 3.92 is GREATER than critical value 3.84 Genetics (accept the null): mouse coat colour
Goodness-of-fitaccept H₀
Mice heterozygous at two coat-colour genes were crossed. A dihybrid cross predicts a 9:3:3:1 ratio of normal agouti : solid black : cinnamon : solid brown. From 32 offspring the observed counts were 22, 4, 3, 3. Do they fit the 9:3:3:1 ratio?
Step 1 — Null hypothesis
H₀: there is no difference between the observed offspring numbers and the numbers expected from a 9:3:3:1 ratio.
Step 2 — Expected counts
A 9:3:3:1 ratio splits 32 into 16 parts: one part = 32 ÷ 16 = 2. So expected = 9×2, 3×2, 3×2, 1×2 = 18, 6, 6, 2.
Step 3 — Build the table
| Phenotype | O | E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| Normal agouti | 22 | 18 | +4 | 16 | 0.89 |
| Solid black | 4 | 6 | −2 | 4 | 0.67 |
| Cinnamon | 3 | 6 | −3 | 9 | 1.50 |
| Solid brown | 3 | 2 | +1 | 1 | 0.50 |
| χ² = Σ = | 3.56 | ||||
Step 4 — Degrees of freedom & critical value
df = 4 − 1 = 3 critical value at p = 0.05, df 3 = 7.82 χ² = 3.56 is LESS than critical value 7.82 Genetics (your turn to picture it): pea seed shape
Goodness-of-fitaccept H₀
Two heterozygous pea plants (Rr × Rr) were crossed. Round (R) is dominant to wrinkled (r), so a 3 round : 1 wrinkled ratio is expected. Of 1000 seeds, 740 were round and 260 wrinkled. Do they fit 3:1?
Step 1 — Null hypothesis
H₀: there is no difference between the observed seed counts and those expected from a 3:1 ratio.
Step 2 — Expected counts & table
3:1 of 1000 → expected 750 round, 250 wrinkled.
| Phenotype | O | E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| Round | 740 | 750 | −10 | 100 | 0.13 |
| Wrinkled | 260 | 250 | +10 | 100 | 0.40 |
| χ² = Σ = | 0.53 | ||||
Step 3 — Decision
df = 2 − 1 = 1 critical value (p = 0.05, df 1) = 3.84 Medical (contingency table): does the drug affect recovery?
Contingencymedical
60 patients were given a new drug and a separate 60 were given a placebo. After treatment each was recorded as recovered or not recovered. Is recovery associated with which treatment they had?
Step 1 — Null hypothesis
H₀: there is no association between the treatment (drug or placebo) and whether the patient recovered — recovery is independent of treatment.
Step 2 — The observed counts, with row and column totals
| Recovered | Not recovered | Row total | |
|---|---|---|---|
| Drug | 45 | 15 | 60 |
| Placebo | 30 | 30 | 60 |
| Column total | 75 | 45 | 120 |
Step 3 — Expected counts from the totals
For each cell, E = (row total × column total) ÷ grand total. For “Drug & Recovered”: E = (60 × 75) ÷ 120 = 37.5. Doing all four:
| Cell | O | E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| Drug & recovered | 45 | 37.5 | +7.5 | 56.25 | 1.50 |
| Drug & not | 15 | 22.5 | −7.5 | 56.25 | 2.50 |
| Placebo & recovered | 30 | 37.5 | −7.5 | 56.25 | 1.50 |
| Placebo & not | 30 | 22.5 | +7.5 | 56.25 | 2.50 |
| χ² = Σ = | 8.00 | ||||
Step 4 — Degrees of freedom & decision
df = (rows − 1) × (columns − 1) = (2 − 1) × (2 − 1) = 1 χ² = 8.00 is GREATER than critical value 3.84 (p = 0.05, df 1) Medical (goodness-of-fit): blood groups in a sample
Goodness-of-fitmedical
In a national population the ABO blood groups occur in the proportions O 44%, A 42%, B 10%, AB 4%. In a sample of 500 blood donors the observed counts were O 200, A 230, B 45, AB 25. Do the donors match the national proportions?
Step 1 — Null hypothesis
H₀: there is no difference between the observed blood-group counts in the donors and the counts expected from the national proportions.
Step 2 — Expected counts (proportion × 500) & table
| Group | O | E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| O (44%) | 200 | 220 | −20 | 400 | 1.82 |
| A (42%) | 230 | 210 | +20 | 400 | 1.90 |
| B (10%) | 45 | 50 | −5 | 25 | 0.50 |
| AB (4%) | 25 | 20 | +5 | 25 | 1.25 |
| χ² = Σ = | 5.47 | ||||
Step 3 — Decision
df = 4 − 1 = 3 critical value (p = 0.05, df 3) = 7.82Degrees of freedom & reading the table
Degrees of freedom tells you which row of the χ² table to use. How you work it out depends on the type of test:
- Goodness-of-fit: df = (number of categories − 1). Two categories → df = 1; four categories (like 9:3:3:1) → df = 3.
- Contingency table: df = (rows − 1) × (columns − 1). A 2×2 table → df = 1.
Find your df row, read across to the p = 0.05 column for the critical value, and apply the rule:
The decision rule
- χ² > critical value → the difference is significant → reject the null hypothesis (observed and expected really differ).
- χ² < critical value → the difference is not significant → you cannot reject the null hypothesis (the data fit the expected).
| Degrees of freedom | p = 0.10 | p = 0.05 (critical) | p = 0.01 |
|---|---|---|---|
| 1 | 2.71 | 3.84 | 6.64 |
| 2 | 4.61 | 5.99 | 9.21 |
| 3 | 6.25 | 7.82 | 11.34 |
| 4 | 7.78 | 9.49 | 13.28 |
Choosing the probability level (and borderline results)
The critical value depends on the probability level (p) you choose. In biology we almost always use p = 0.05 (5%): the point where there is only a 1 in 20 (5%) chance that a difference this big could happen by chance alone. If χ² is bigger than the 0.05 critical value, we call the result significant.
But what does the probability level mean, and what happens when a result is borderline? Consider a 3:1 cross of 400 seeds where we observe 285 round : 115 wrinkled (expected 300 : 100):
| Phenotype | O | E | O − E | (O − E)² | (O − E)² ÷ E |
|---|---|---|---|---|---|
| Round | 285 | 300 | −15 | 225 | 0.75 |
| Wrinkled | 115 | 100 | +15 | 225 | 2.25 |
| χ² = Σ = | 3.00 | ||||
Here χ² = 3.00 with df = 1. Look along the df = 1 row of the table above: 3.00 falls between the p = 0.10 value (2.71) and the p = 0.05 value (3.84).
What that tells you
- χ² (3.00) is less than the 0.05 critical value (3.84), so at the 5% level we cannot reject the null hypothesis — the data still fit a 3:1 ratio.
- But 3.00 is more than the 0.10 value (2.71), so the probability that this difference is due to chance is between 5% and 10%. It is a borderline result — close to significant.
- You would not switch to a laxer level (like p = 0.10) just to change the outcome. The 5% level is the biological convention; you choose it before you see the data, and report honestly that the result is close.
Check your understanding
Self-marking questions, plus a chance to calculate χ² yourself from a genetic cross.
What each exam board expects
All the main A-level Biology specifications include the chi-squared test as a named statistical test.
| Board | What is required |
|---|---|
| AQA (7402) | Use the chi-squared test to compare observed and expected results (e.g. genetic ratios); interpret against the critical value at p = 0.05 and understand degrees of freedom. |
| OCR A / B | Chi-squared named explicitly for goodness-of-fit and (for association) contingency tables; calculate χ², find degrees of freedom, compare with the critical value. |
| Edexcel A / B | Use the chi-squared test to test the significance of the difference between observed and expected results; understand significance at the 5% level and the null hypothesis. |
| WJEC / Eduqas | Chi-squared test on genetic-cross data; state the null hypothesis, complete the table, calculate χ², use the correct degrees of freedom and probability level, and interpret against tabulated critical values. |
