Skip to content

When do you use a chi-squared test?

You use a chi-squared (χ²) test when your data are counts of individuals in categories — frequencies — and you want to know whether those counts differ significantly from what you expected. “Chi” is a Greek letter, said “kye” (to rhyme with “eye”), and χ² is “kye-squared”.

Use a chi-squared test when…

  • your data are counts / frequencies in categories (e.g. numbers of purple vs white flowers) — whole individuals, not measurements;
  • you have an expected set of counts to compare against — from a genetic ratio, a theory, or from row and column totals;
  • you want to know if the difference between observed and expected is bigger than chance would give.

Do NOT use chi-squared for…

  • measurements (length, mass, rate) — comparing two means is a job for the t-test;
  • looking for a relationship between two measured variables — that is Spearman’s rank correlation;
  • data given as percentages or proportions — chi-squared must use the actual counts, never percentages.

The idea behind the test

Suppose you cross two plants and expect the offspring in a 3:1 ratio. You almost never get exactly 3:1 — you might get 74:26 instead of 75:25. Is that small difference just chance, or is your expected ratio wrong? The chi-squared test measures how far the observed counts are from the expected counts, all together, as a single number.

  • For each category it takes the difference (observed − expected), squares it (so differences never cancel out), and divides by the expected count (so a difference of 5 matters more when only 10 were expected than when 500 were).
  • Adding those up gives χ². A small χ² means observed and expected are close — a good fit. A large χ² means they are far apart — the expected ratio may be wrong.

The null hypothesis (always state it first)

Every chi-squared test starts from the null hypothesis: “there is no difference between the observed and expected counts” (any difference is due to chance). The test tells you whether the evidence is strong enough to reject it. Note the wording is “no difference” — the word “significant” belongs to your conclusion, not the hypothesis.

The formula (given to you in the exam)

You are always given the formula. What matters is knowing what each symbol means and being able to build the table of working:

Reading the symbols:

  • χ²the chi-squared value — the single number you are calculating.
  • Σ“sigma” — add up the terms that follow, one for every category.
  • Othe observed count in a category (what you actually counted).
  • Ethe expected count in that category (what the ratio or theory predicts).
  • (O−E)the difference between observed and expected for that category.
  • (O−E)²that difference squared (so a shortfall and an excess do not cancel).
  • (O−E)²⁄Ethe squared difference divided by the expected count for that category — one term per category.

In plain steps: (1) work out the expected count E for each category; (2) for each, find O−E; (3) square it; (4) divide by E; (5) add all the terms to get χ²; (6) compare χ² with the critical value.

Always work with the actual counts, never percentages, and check that every expected value is 5 or more — chi-squared is unreliable if any expected count is below 5.

Two ways it is used

At A-level the same formula is used in two situations. The only difference is where the expected values come from and how you count the degrees of freedom.

1. Goodness-of-fit — “do the counts fit a ratio?”

Expected values come from a predicted ratio or theory (e.g. a 3:1 or 9:3:3:1 genetic ratio, or known population proportions). This is by far the most common A-level use. Degrees of freedom = (number of categories − 1).

2. Contingency table — “are two things associated?”

Data are cross-classified by two categorical variables (e.g. treatment × recovery). Expected values are worked out from the row and column totals: E = (row total × column total) ÷ grand total. Degrees of freedom = (rows − 1) × (columns − 1).

Worked examples

Full worked examples — two real exam-style genetics questions (one where you reject the null hypothesis, one where you accept it), plus medical examples of both kinds of test. Click to expand each method.

Genetics (reject the null): purple vs white flowers

Goodness-of-fit · expected 1:1 · exam-style (WJEC)
Goodness-of-fitreject H₀

White flowers were crossed with purple flowers. One hypothesis is that purple is caused by a single dominant allele, which predicts the F1 should be 1 purple : 1 white. The cross gave 32 white and 18 purple (50 plants). Do the results fit the 1:1 ratio?

Step 1 — State the null hypothesis

Null hypothesis (H₀): there is no difference between the observed numbers of purple and white flowers and the numbers expected from a 1:1 ratio.

Step 2 — Work out the expected counts

A 1:1 ratio of 50 plants means 25 expected in each category.

Step 3 — Build the table

PhenotypeObserved OExpected EO − E(O − E)²(O − E)² ÷ E
White3225+7491.96
Purple1825−7491.96
χ² = Σ =3.92
051015202530353225White1825PurpleNumber of F1 plantsObservedExpectedFlower colour: observed vs expected (1:1)
Observed counts (green) against the expected 1:1 counts (amber). The bars differ noticeably — the test tells us whether that difference is significant.

Step 4 — Degrees of freedom

df = (number of categories − 1) = 2 − 1 = 1

Step 5 — Compare with the critical value

At p = 0.05 with 1 degree of freedom the critical value is 3.84.

χ² = 3.92 is GREATER than critical value 3.84
Conclusion: because χ² (3.92) is greater than the critical value (3.84), the difference is significant — we reject the null hypothesis. The results do not fit a simple 1:1 ratio, so the single-dominant-allele hypothesis is wrong. (In the real question, a second hypothesis — two interacting genes — gave χ² = 0.044, a very good fit, so that model was accepted instead.)
A high χ² that makes you reject the null hypothesis is telling you the expected model is probably wrong — here, the inheritance is more complicated than one gene. Rejecting is a real, useful result, not a failure.

Genetics (accept the null): mouse coat colour

Goodness-of-fit · expected 9:3:3:1 · exam-style (WJEC)
Goodness-of-fitaccept H₀

Mice heterozygous at two coat-colour genes were crossed. A dihybrid cross predicts a 9:3:3:1 ratio of normal agouti : solid black : cinnamon : solid brown. From 32 offspring the observed counts were 22, 4, 3, 3. Do they fit the 9:3:3:1 ratio?

Step 1 — Null hypothesis

H₀: there is no difference between the observed offspring numbers and the numbers expected from a 9:3:3:1 ratio.

Step 2 — Expected counts

A 9:3:3:1 ratio splits 32 into 16 parts: one part = 32 ÷ 16 = 2. So expected = 9×2, 3×2, 3×2, 1×2 = 18, 6, 6, 2.

Step 3 — Build the table

PhenotypeOEO − E(O − E)²(O − E)² ÷ E
Normal agouti2218+4160.89
Solid black46−240.67
Cinnamon36−391.50
Solid brown32+110.50
χ² = Σ =3.56

Step 4 — Degrees of freedom & critical value

df = 4 − 1 = 3 critical value at p = 0.05, df 3 = 7.82 χ² = 3.56 is LESS than critical value 7.82
Conclusion: because χ² (3.56) is less than the critical value (7.82), the difference is not significant — we cannot reject the null hypothesis. The results are consistent with a 9:3:3:1 ratio, so the two-gene dihybrid model fits. The small differences are just chance.
Watch the expected-value rule: here the smallest expected value is 2, which is below 5. In a real exam you would note that the test is less reliable when an expected value is under 5 — ideally you would use more offspring.

Genetics (your turn to picture it): pea seed shape

Goodness-of-fit · expected 3:1
Goodness-of-fitaccept H₀

Two heterozygous pea plants (Rr × Rr) were crossed. Round (R) is dominant to wrinkled (r), so a 3 round : 1 wrinkled ratio is expected. Of 1000 seeds, 740 were round and 260 wrinkled. Do they fit 3:1?

Step 1 — Null hypothesis

H₀: there is no difference between the observed seed counts and those expected from a 3:1 ratio.

Step 2 — Expected counts & table

3:1 of 1000 → expected 750 round, 250 wrinkled.

PhenotypeOEO − E(O − E)²(O − E)² ÷ E
Round740750−101000.13
Wrinkled260250+101000.40
χ² = Σ =0.53
0200400600800740750Round260250WrinkledNumber of pea seedsObservedExpectedPea seed shape: observed vs expected (3:1)
Observed pea counts (green) sit almost on top of the expected 3:1 counts (amber) — a very good fit, so χ² is small.

Step 3 — Decision

df = 2 − 1 = 1 critical value (p = 0.05, df 1) = 3.84
Conclusion: χ² = 0.53 is far less than 3.84, so we cannot reject the null hypothesis. The seed counts fit a 3:1 ratio well — consistent with simple dominance at one gene.

Medical (contingency table): does the drug affect recovery?

Contingency 2×2 · association · drug vs placebo
Contingencymedical

60 patients were given a new drug and a separate 60 were given a placebo. After treatment each was recorded as recovered or not recovered. Is recovery associated with which treatment they had?

Step 1 — Null hypothesis

H₀: there is no association between the treatment (drug or placebo) and whether the patient recovered — recovery is independent of treatment.

Step 2 — The observed counts, with row and column totals

RecoveredNot recoveredRow total
Drug451560
Placebo303060
Column total7545120

Step 3 — Expected counts from the totals

For each cell, E = (row total × column total) ÷ grand total. For “Drug & Recovered”: E = (60 × 75) ÷ 120 = 37.5. Doing all four:

CellOEO − E(O − E)²(O − E)² ÷ E
Drug & recovered4537.5+7.556.251.50
Drug & not1522.5−7.556.252.50
Placebo & recovered3037.5−7.556.251.50
Placebo & not3022.5+7.556.252.50
χ² = Σ =8.00

Step 4 — Degrees of freedom & decision

df = (rows − 1) × (columns − 1) = (2 − 1) × (2 − 1) = 1 χ² = 8.00 is GREATER than critical value 3.84 (p = 0.05, df 1)
Conclusion: χ² (8.00) is greater than 3.84, so the difference is significant — we reject the null hypothesis. Recovery is associated with treatment: significantly more patients recovered on the drug than on the placebo.
Contingency tables use E = (row total × column total) ÷ grand total and df = (rows−1)(columns−1) — that is the only real difference from a goodness-of-fit test. As always, association is not proof of mechanism, but a controlled drug-vs-placebo design makes the drug the most likely cause.

Medical (goodness-of-fit): blood groups in a sample

Goodness-of-fit · expected from population proportions
Goodness-of-fitmedical

In a national population the ABO blood groups occur in the proportions O 44%, A 42%, B 10%, AB 4%. In a sample of 500 blood donors the observed counts were O 200, A 230, B 45, AB 25. Do the donors match the national proportions?

Step 1 — Null hypothesis

H₀: there is no difference between the observed blood-group counts in the donors and the counts expected from the national proportions.

Step 2 — Expected counts (proportion × 500) & table

GroupOEO − E(O − E)²(O − E)² ÷ E
O (44%)200220−204001.82
A (42%)230210+204001.90
B (10%)4550−5250.50
AB (4%)2520+5251.25
χ² = Σ =5.47
050100150200250200220O230210A4550B2520ABNumber of peopleObservedExpectedBlood groups: observed vs expected
Observed donor counts (green) against those expected from the national proportions (amber). The differences are modest.

Step 3 — Decision

df = 4 − 1 = 3 critical value (p = 0.05, df 3) = 7.82
Conclusion: χ² (5.47) is less than 7.82, so we cannot reject the null hypothesis. The donors’ blood groups are consistent with the national proportions — the differences are within what chance would produce.
Note the expected counts are proportion × total (0.44 × 500 = 220, etc.), and they must be the counts, never the percentages. Every expected value here is well above 5, so the test is reliable.

Degrees of freedom & reading the table

Degrees of freedom tells you which row of the χ² table to use. How you work it out depends on the type of test:

  • Goodness-of-fit: df = (number of categories − 1). Two categories → df = 1; four categories (like 9:3:3:1) → df = 3.
  • Contingency table: df = (rows − 1) × (columns − 1). A 2×2 table → df = 1.

Find your df row, read across to the p = 0.05 column for the critical value, and apply the rule:

The decision rule

  • χ² > critical value → the difference is significantreject the null hypothesis (observed and expected really differ).
  • χ² < critical value → the difference is not significant → you cannot reject the null hypothesis (the data fit the expected).
Degrees of freedomp = 0.10p = 0.05 (critical)p = 0.01
12.713.846.64
24.615.999.21
36.257.8211.34
47.789.4913.28
The more categories you have, the bigger the critical value — because with more categories there are more chances for random differences to add up, so χ² has to be larger before it counts as significant.

Choosing the probability level (and borderline results)

The critical value depends on the probability level (p) you choose. In biology we almost always use p = 0.05 (5%): the point where there is only a 1 in 20 (5%) chance that a difference this big could happen by chance alone. If χ² is bigger than the 0.05 critical value, we call the result significant.

But what does the probability level mean, and what happens when a result is borderline? Consider a 3:1 cross of 400 seeds where we observe 285 round : 115 wrinkled (expected 300 : 100):

PhenotypeOEO − E(O − E)²(O − E)² ÷ E
Round285300−152250.75
Wrinkled115100+152252.25
χ² = Σ =3.00

Here χ² = 3.00 with df = 1. Look along the df = 1 row of the table above: 3.00 falls between the p = 0.10 value (2.71) and the p = 0.05 value (3.84).

What that tells you

  • χ² (3.00) is less than the 0.05 critical value (3.84), so at the 5% level we cannot reject the null hypothesis — the data still fit a 3:1 ratio.
  • But 3.00 is more than the 0.10 value (2.71), so the probability that this difference is due to chance is between 5% and 10%. It is a borderline result — close to significant.
  • You would not switch to a laxer level (like p = 0.10) just to change the outcome. The 5% level is the biological convention; you choose it before you see the data, and report honestly that the result is close.
A common exam skill: given χ² and the table, state the probability range the value falls in (“between 0.05 and 0.10, so more than a 5% chance it is due to chance”) and conclude accordingly. Never lower your significance level after the fact to force a “significant” result.

Check your understanding

Self-marking questions, plus a chance to calculate χ² yourself from a genetic cross.

What each exam board expects

All the main A-level Biology specifications include the chi-squared test as a named statistical test.

BoardWhat is required
AQA (7402)Use the chi-squared test to compare observed and expected results (e.g. genetic ratios); interpret against the critical value at p = 0.05 and understand degrees of freedom.
OCR A / BChi-squared named explicitly for goodness-of-fit and (for association) contingency tables; calculate χ², find degrees of freedom, compare with the critical value.
Edexcel A / BUse the chi-squared test to test the significance of the difference between observed and expected results; understand significance at the 5% level and the null hypothesis.
WJEC / EduqasChi-squared test on genetic-cross data; state the null hypothesis, complete the table, calculate χ², use the correct degrees of freedom and probability level, and interpret against tabulated critical values.