When do you use a chi-squared test in A-level Biology?

Use a chi-squared test when your data are COUNTS in categories (frequencies), not measurements, and you want to compare the counts you observed with the counts you expected. The most common A-level use is the goodness-of-fit test, checking whether the offspring of a genetic cross fit an expected ratio such as 3:1 or 9:3:3:1. A contingency-table chi-squared tests whether two categorical variables are associated.

What is the difference between goodness-of-fit and a contingency table?

A goodness-of-fit chi-squared compares observed counts in categories with counts expected from a theory or ratio (e.g. a 3:1 genetic ratio). A contingency-table chi-squared tests whether two categorical variables are associated (e.g. whether recovery depends on which treatment a patient received); the expected values are calculated from the row and column totals.

How do you work out the degrees of freedom for a chi-squared test?

For a goodness-of-fit test, degrees of freedom = (number of categories − 1). For a contingency table, degrees of freedom = (number of rows − 1) × (number of columns − 1). You use the degrees of freedom to read the correct critical value from the table.

How do you decide whether to accept or reject the null hypothesis?

Compare your calculated chi-squared value with the critical value from the table at p = 0.05 for your degrees of freedom. If the calculated value is greater than the critical value, the difference between observed and expected is significant, so you reject the null hypothesis. If it is less than the critical value, the difference is not significant and you cannot reject the null hypothesis.

AQAOCREdexcelWJEC / Eduqas

The chi-squared test: do the counts fit?

Some data are not measurements — they are counts in categories: how many purple vs white flowers, how many of each blood group. The chi-squared (χ²) test asks whether the counts you observed match the counts you expected, or whether the difference is too big to be chance. This page builds it from scratch, with real exam-style worked examples.

When to use it The idea The formula Two types Worked examples Degrees of freedom & the table Choosing the p-value Check your understanding

When do you use a chi-squared test?

You use a chi-squared (χ²) test when your data are counts of individuals in categories — frequencies — and you want to know whether those counts differ significantly from what you expected. “Chi” is a Greek letter, said “kye” (to rhyme with “eye”), and χ² is “kye-squared”.

Use a chi-squared test when…

your data are counts / frequencies in categories (e.g. numbers of purple vs white flowers) — whole individuals, not measurements;
you have an expected set of counts to compare against — from a genetic ratio, a theory, or from row and column totals;
you want to know if the difference between observed and expected is bigger than chance would give.

Do NOT use chi-squared for…

measurements (length, mass, rate) — comparing two means is a job for the t-test;
looking for a relationship between two measured variables — that is Spearman’s rank correlation;
data given as percentages or proportions — chi-squared must use the actual counts, never percentages.

The idea behind the test

Suppose you cross two plants and expect the offspring in a 3:1 ratio. You almost never get exactly 3:1 — you might get 74:26 instead of 75:25. Is that small difference just chance, or is your expected ratio wrong? The chi-squared test measures how far the observed counts are from the expected counts, all together, as a single number.

For each category it takes the difference (observed − expected), squares it (so differences never cancel out), and divides by the expected count (so a difference of 5 matters more when only 10 were expected than when 500 were).
Adding those up gives χ². A small χ² means observed and expected are close — a good fit. A large χ² means they are far apart — the expected ratio may be wrong.

The null hypothesis (always state it first)

Every chi-squared test starts from the null hypothesis: “there is no difference between the observed and expected counts” (any difference is due to chance). The test tells you whether the evidence is strong enough to reject it. Note the wording is “no difference” — the word “significant” belongs to your conclusion, not the hypothesis.

The formula (given to you in the exam)

You are always given the formula. What matters is knowing what each symbol means and being able to build the table of working:

Reading the symbols:

χ²the chi-squared value — the single number you are calculating.
Σ“sigma” — add up the terms that follow, one for every category.
Othe observed count in a category (what you actually counted).
Ethe expected count in that category (what the ratio or theory predicts).
(O−E)the difference between observed and expected for that category.
(O−E)²that difference squared (so a shortfall and an excess do not cancel).
(O−E)²⁄Ethe squared difference divided by the expected count for that category — one term per category.

In plain steps: (1) work out the expected count E for each category; (2) for each, find O−E; (3) square it; (4) divide by E; (5) add all the terms to get χ²; (6) compare χ² with the critical value.

Always work with the actual counts, never percentages, and check that every expected value is 5 or more — chi-squared is unreliable if any expected count is below 5.

Two ways it is used

At A-level the same formula is used in two situations. The only difference is where the expected values come from and how you count the degrees of freedom.

1. Goodness-of-fit — “do the counts fit a ratio?”

Expected values come from a predicted ratio or theory (e.g. a 3:1 or 9:3:3:1 genetic ratio, or known population proportions). This is by far the most common A-level use. Degrees of freedom = (number of categories − 1).

2. Contingency table — “are two things associated?”

Data are cross-classified by two categorical variables (e.g. treatment × recovery). Expected values are worked out from the row and column totals: E = (row total × column total) ÷ grand total. Degrees of freedom = (rows − 1) × (columns − 1).

Worked examples

Full worked examples — two real exam-style genetics questions (one where you reject the null hypothesis, one where you accept it), plus medical examples of both kinds of test. Click to expand each method.

Genetics (reject the null): purple vs white flowers

Goodness-of-fit · expected 1:1 · exam-style (WJEC) Goodness-of-fitreject H₀

White flowers were crossed with purple flowers. One hypothesis is that purple is caused by a single dominant allele, which predicts the F1 should be 1 purple : 1 white. The cross gave 32 white and 18 purple (50 plants). Do the results fit the 1:1 ratio?

Step 1 — State the null hypothesis

Null hypothesis (H₀): there is no difference between the observed numbers of purple and white flowers and the numbers expected from a 1:1 ratio.

Step 2 — Work out the expected counts

A 1:1 ratio of 50 plants means 25 expected in each category.

Step 3 — Build the table

Phenotype	Observed O	Expected E	O − E	(O − E)²	(O − E)² ÷ E
White	32	25	+7	49	1.96
Purple	18	25	−7	49	1.96
χ² = Σ =					3.92

Observed counts (green) against the expected 1:1 counts (amber). The bars differ noticeably — the test tells us whether that difference is significant.

Step 4 — Degrees of freedom

df = (number of categories − 1) = 2 − 1 = 1

Step 5 — Compare with the critical value

At p = 0.05 with 1 degree of freedom the critical value is 3.84.

χ² = 3.92 is GREATER than critical value 3.84

Conclusion: because χ² (3.92) is greater than the critical value (3.84), the difference is significant — we reject the null hypothesis. The results do not fit a simple 1:1 ratio, so the single-dominant-allele hypothesis is wrong. (In the real question, a second hypothesis — two interacting genes — gave χ² = 0.044, a very good fit, so that model was accepted instead.)

A high χ² that makes you reject the null hypothesis is telling you the expected model is probably wrong — here, the inheritance is more complicated than one gene. Rejecting is a real, useful result, not a failure.

Genetics (accept the null): mouse coat colour

Goodness-of-fit · expected 9:3:3:1 · exam-style (WJEC) Goodness-of-fitaccept H₀

Mice heterozygous at two coat-colour genes were crossed. A dihybrid cross predicts a 9:3:3:1 ratio of normal agouti : solid black : cinnamon : solid brown. From 32 offspring the observed counts were 22, 4, 3, 3. Do they fit the 9:3:3:1 ratio?

Step 1 — Null hypothesis

H₀: there is no difference between the observed offspring numbers and the numbers expected from a 9:3:3:1 ratio.

Step 2 — Expected counts

A 9:3:3:1 ratio splits 32 into 16 parts: one part = 32 ÷ 16 = 2. So expected = 9×2, 3×2, 3×2, 1×2 = 18, 6, 6, 2.

Step 3 — Build the table

Phenotype	O	E	O − E	(O − E)²	(O − E)² ÷ E
Normal agouti	22	18	+4	16	0.89
Solid black	4	6	−2	4	0.67
Cinnamon	3	6	−3	9	1.50
Solid brown	3	2	+1	1	0.50
χ² = Σ =					3.56

Step 4 — Degrees of freedom & critical value

df = 4 − 1 = 3 critical value at p = 0.05, df 3 = 7.82 χ² = 3.56 is LESS than critical value 7.82

Conclusion: because χ² (3.56) is less than the critical value (7.82), the difference is not significant — we cannot reject the null hypothesis. The results are consistent with a 9:3:3:1 ratio, so the two-gene dihybrid model fits. The small differences are just chance.

Watch the expected-value rule: here the smallest expected value is 2, which is below 5. In a real exam you would note that the test is less reliable when an expected value is under 5 — ideally you would use more offspring.

Genetics (your turn to picture it): pea seed shape

Goodness-of-fit · expected 3:1 Goodness-of-fitaccept H₀

Two heterozygous pea plants (Rr × Rr) were crossed. Round (R) is dominant to wrinkled (r), so a 3 round : 1 wrinkled ratio is expected. Of 1000 seeds, 740 were round and 260 wrinkled. Do they fit 3:1?

Step 1 — Null hypothesis

H₀: there is no difference between the observed seed counts and those expected from a 3:1 ratio.

Step 2 — Expected counts & table

3:1 of 1000 → expected 750 round, 250 wrinkled.

Phenotype	O	E	O − E	(O − E)²	(O − E)² ÷ E
Round	740	750	−10	100	0.13
Wrinkled	260	250	+10	100	0.40
χ² = Σ =					0.53

Observed pea counts (green) sit almost on top of the expected 3:1 counts (amber) — a very good fit, so χ² is small.

Step 3 — Decision

df = 2 − 1 = 1 critical value (p = 0.05, df 1) = 3.84

Conclusion: χ² = 0.53 is far less than 3.84, so we cannot reject the null hypothesis. The seed counts fit a 3:1 ratio well — consistent with simple dominance at one gene.

Medical (contingency table): does the drug affect recovery?

Contingency 2×2 · association · drug vs placebo Contingencymedical

60 patients were given a new drug and a separate 60 were given a placebo. After treatment each was recorded as recovered or not recovered. Is recovery associated with which treatment they had?

Step 1 — Null hypothesis

H₀: there is no association between the treatment (drug or placebo) and whether the patient recovered — recovery is independent of treatment.

Step 2 — The observed counts, with row and column totals

	Recovered	Not recovered	Row total
Drug	45	15	60
Placebo	30	30	60
Column total	75	45	120

Step 3 — Expected counts from the totals

For each cell, E = (row total × column total) ÷ grand total. For “Drug & Recovered”: E = (60 × 75) ÷ 120 = 37.5. Doing all four:

Cell	O	E	O − E	(O − E)²	(O − E)² ÷ E
Drug & recovered	45	37.5	+7.5	56.25	1.50
Drug & not	15	22.5	−7.5	56.25	2.50
Placebo & recovered	30	37.5	−7.5	56.25	1.50
Placebo & not	30	22.5	+7.5	56.25	2.50
χ² = Σ =					8.00

Step 4 — Degrees of freedom & decision

df = (rows − 1) × (columns − 1) = (2 − 1) × (2 − 1) = 1 χ² = 8.00 is GREATER than critical value 3.84 (p = 0.05, df 1)

Conclusion: χ² (8.00) is greater than 3.84, so the difference is significant — we reject the null hypothesis. Recovery is associated with treatment: significantly more patients recovered on the drug than on the placebo.

Contingency tables use E = (row total × column total) ÷ grand total and df = (rows−1)(columns−1) — that is the only real difference from a goodness-of-fit test. As always, association is not proof of mechanism, but a controlled drug-vs-placebo design makes the drug the most likely cause.

Medical (goodness-of-fit): blood groups in a sample

Goodness-of-fit · expected from population proportions Goodness-of-fitmedical

In a national population the ABO blood groups occur in the proportions O 44%, A 42%, B 10%, AB 4%. In a sample of 500 blood donors the observed counts were O 200, A 230, B 45, AB 25. Do the donors match the national proportions?

Step 1 — Null hypothesis

H₀: there is no difference between the observed blood-group counts in the donors and the counts expected from the national proportions.

Step 2 — Expected counts (proportion × 500) & table

Group	O	E	O − E	(O − E)²	(O − E)² ÷ E
O (44%)	200	220	−20	400	1.82
A (42%)	230	210	+20	400	1.90
B (10%)	45	50	−5	25	0.50
AB (4%)	25	20	+5	25	1.25
χ² = Σ =					5.47

Observed donor counts (green) against those expected from the national proportions (amber). The differences are modest.

Step 3 — Decision

df = 4 − 1 = 3 critical value (p = 0.05, df 3) = 7.82

Conclusion: χ² (5.47) is less than 7.82, so we cannot reject the null hypothesis. The donors’ blood groups are consistent with the national proportions — the differences are within what chance would produce.

Note the expected counts are proportion × total (0.44 × 500 = 220, etc.), and they must be the counts, never the percentages. Every expected value here is well above 5, so the test is reliable.

Degrees of freedom & reading the table

Degrees of freedom tells you which row of the χ² table to use. How you work it out depends on the type of test:

Goodness-of-fit: df = (number of categories − 1). Two categories → df = 1; four categories (like 9:3:3:1) → df = 3.
Contingency table: df = (rows − 1) × (columns − 1). A 2×2 table → df = 1.

Find your df row, read across to the p = 0.05 column for the critical value, and apply the rule:

The decision rule

χ² > critical value → the difference is significant → reject the null hypothesis (observed and expected really differ).
χ² < critical value → the difference is not significant → you cannot reject the null hypothesis (the data fit the expected).

Degrees of freedom	p = 0.10	p = 0.05 (critical)	p = 0.01
1	2.71	3.84	6.64
2	4.61	5.99	9.21
3	6.25	7.82	11.34
4	7.78	9.49	13.28

The more categories you have, the bigger the critical value — because with more categories there are more chances for random differences to add up, so χ² has to be larger before it counts as significant.

Choosing the probability level (and borderline results)

The critical value depends on the probability level (p) you choose. In biology we almost always use p = 0.05 (5%): the point where there is only a 1 in 20 (5%) chance that a difference this big could happen by chance alone. If χ² is bigger than the 0.05 critical value, we call the result significant.

But what does the probability level mean, and what happens when a result is borderline? Consider a 3:1 cross of 400 seeds where we observe 285 round : 115 wrinkled (expected 300 : 100):

Phenotype	O	E	O − E	(O − E)²	(O − E)² ÷ E
Round	285	300	−15	225	0.75
Wrinkled	115	100	+15	225	2.25
χ² = Σ =					3.00

Here χ² = 3.00 with df = 1. Look along the df = 1 row of the table above: 3.00 falls between the p = 0.10 value (2.71) and the p = 0.05 value (3.84).

What that tells you

χ² (3.00) is less than the 0.05 critical value (3.84), so at the 5% level we cannot reject the null hypothesis — the data still fit a 3:1 ratio.
But 3.00 is more than the 0.10 value (2.71), so the probability that this difference is due to chance is between 5% and 10%. It is a borderline result — close to significant.
You would not switch to a laxer level (like p = 0.10) just to change the outcome. The 5% level is the biological convention; you choose it before you see the data, and report honestly that the result is close.

A common exam skill: given χ² and the table, state the probability range the value falls in (“between 0.05 and 0.10, so more than a 5% chance it is due to chance”) and conclude accordingly. Never lower your significance level after the fact to force a “significant” result.

Check your understanding

Self-marking questions, plus a chance to calculate χ² yourself from a genetic cross.

What each exam board expects

All the main A-level Biology specifications include the chi-squared test as a named statistical test.

Board	What is required
AQA (7402)	Use the chi-squared test to compare observed and expected results (e.g. genetic ratios); interpret against the critical value at p = 0.05 and understand degrees of freedom.
OCR A / B	Chi-squared named explicitly for goodness-of-fit and (for association) contingency tables; calculate χ², find degrees of freedom, compare with the critical value.
Edexcel A / B	Use the chi-squared test to test the significance of the difference between observed and expected results; understand significance at the 5% level and the null hypothesis.
WJEC / Eduqas	Chi-squared test on genetic-cross data; state the null hypothesis, complete the table, calculate χ², use the correct degrees of freedom and probability level, and interpret against tabulated critical values.

← Back toStatistics hub & how to choose a test ToolA-level Biology calculator (chi-squared & more) Also seeThe t-test (comparing two means)