What "allele frequency" actually measures

An allele is one of the two copies of a gene that every diploid individual carries — one from each parent. The frequency of an allele is the share of all gene copies in the population that happen to be that variant. If 1 in 50 of those copies were the cystic-fibrosis variant, the allele frequency is 0.02.

Two facts matter for the calculator: an allele frequency always sits between 0 and 1, and across a single gene the frequencies of all variants must sum to 1. For a two-allele system that's the familiar p + q = 1.

The Hardy-Weinberg equation

Under random mating, no selection and a large population, genotype frequencies are predicted by squaring the allele-frequency equation. That gives the three classic terms — homozygous dominant, heterozygous (carrier), and homozygous recessive (affected):

(p + q)² = p² + 2pq + q² = 1

  • p — Frequency of the dominant (wild-type) allele, conventionally A
  • q — Frequency of the recessive allele, conventionally a
  • — Share homozygous dominant (AA) — unaffected non-carriers
  • 2pq — Share heterozygous (Aa) — phenotypically healthy carriers
  • — Share homozygous recessive (aa) — affected by the condition

Going from disease prevalence to allele frequencies

The calculator works backwards from the only number you usually have: disease prevalence. Every affected person is homozygous recessive, so prevalence equals q². Square-root to get q, subtract from 1 to get p, then plug both into 2pq to find the carrier rate.

Worked example with a 1-in-10,000 condition:

  • q² = 0.0001 → q = 0.01
  • p = 1 − 0.01 = 0.99
  • 2pq = 2 × 0.99 × 0.01 ≈ 0.0198 — about 1 carrier in 50

The carrier rate is the surprise: rare diseases produce many more carriers than affected individuals. A 1-in-a-million condition still yields a 1-in-500 carrier rate.

Reading the result responsibly

Hardy-Weinberg is a model with assumptions that get violated in real human populations. Three sanity checks before quoting a carrier rate.

  1. Match the prevalence to the population — Cystic fibrosis hits about 1 in 2,500 Northern Europeans but only around 1 in 17,000 African Americans. Plugging the wrong prevalence misses the carrier rate by an order of magnitude.
  2. Founder populations break the model — Ashkenazi Jewish, Finnish and French-Canadian populations carry several recessive variants well above general-population rates. For counselling, use ethnicity-specific prevalence figures.
  3. Consanguinity inflates homozygotes — Relatives share rare alleles, so the homozygous frequency rises above q². Hardy-Weinberg silently assumes mating is random.

Frequently Asked Questions

How do you calculate p and q allele frequency?

From disease prevalence, treat the prevalence as q². Square-root to get q, then p = 1 − q. From genotype counts, divide each allele count by the total number of alleles (twice the sample size).

What do p and q mean in allele frequency?

Conventionally, p is the frequency of the dominant or wild-type allele (often written A) and q is the frequency of the recessive or mutant allele (often written a). At a biallelic locus, p + q = 1.

How do you calculate minor allele frequency?

Minor allele frequency (MAF) is the share held by the less common of two alleles. For most disease loci MAF equals q. The same arithmetic applies — just label the smaller frequency q.

How do you handle four alleles?

The constraint becomes p + q + r + s = 1. Squaring it gives ten genotype classes: four homozygotes (p², q², r², s²) plus six 2xy heterozygote terms. The two-allele calculator above doesn't cover this case directly.

If 1% of people have a disease, what are the allele frequencies?

q² = 0.01 → q = 0.1, p = 0.9 and 2pq = 0.18. About 18% of the population are carriers — far higher than people intuit from a 1% disease rate.

More biology calculators (see all)