The Coupon Collector Problem
Expected number of draws to collect all n types. The answer is n · H_n ≈ n ln n.
also known as: expected draws to collect all types · cereal box problem
You draw uniformly at random (with replacement) from n distinct coupon types until you have seen all n. What is the expected number of draws?
ANSWER
n · H_n = n · (1 + 1/2 + 1/3 + ... + 1/n) ≈ n ln n.
CANONICAL SOLUTION
Break the process into n stages, where stage k is the wait for the k-th distinct coupon to appear (after k-1 have already been collected). At stage k, the probability that a single draw produces a new coupon is (n - k + 1)/n. Therefore, the number of draws at stage k is geometric with parameter p = (n - k + 1)/n, giving:
Summing:
For large n, this is approximately , where is the Euler-Mascheroni constant.
AI ALTERNATIVE FRAMING
Think of it as a race against diminishing returns. Early on, every draw is likely to be a new coupon (high variety). As your collection grows, you're increasingly wasting draws on duplicates. The geometric expectation at each stage captures this: the first coupon takes 1 draw (certain), the second averages n/(n-1) draws, and the last takes n draws on average (only a 1/n chance of being new). The harmonic sum emerges naturally — it's the sum of the reciprocals of the stage success probabilities.
On QuantPrep, every practice problem gets an AI-generated alternative explanation like this — tailored to the canonical solution, streaming live.
COMMON VARIATIONS
What is the variance of the coupon collector time?
Since stages are independent geometrics, $\text{Var}[\text{total}] = \sum_{k=1}^{n} \frac{k-1}{n} \cdot \left(\frac{n}{n-k+1}\right)^2$. For large n, this is approximately $\frac{\pi^2}{6} n^2$, dominated by the final (rare-coupon) stages.
Each coupon type has a different probability pᵢ (not uniform). What is the expected time to collect all?
No closed form in general; use inclusion-exclusion: $\mathbb{E}[T] = \int_0^\infty (1 - \prod_i (1 - e^{-p_i t})) dt$.
Expected draws to collect any k specific types, where k < n.
Analogous stage decomposition: $n \sum_{j=n-k+1}^n 1/j$, which is smaller than the full collector time by the missing tail of the harmonic sum.
FIRMS THAT ASK THIS
TECHNIQUE
The single most common technique in quant interviews. Every problem reduces to it eventually.
Not ready to sign up? Get 5 hand-picked quant interview problems every week. No spam, unsubscribe any time.
TRY A PROBLEM NOW · NO SIGNUP
In the classical secretary problem, as n → ∞, what fraction should you reject outright before accepting the next best-so-far candidate?
FAQ
Because by the time you've collected n-1 types, only 1 in n draws gives you the missing last type. That's a geometric wait with expected value n. Most of the total time is spent on the last few coupons.
The decomposition into n independent geometric stages is a direct application of linearity. You don't need the stages to be independent to use linearity (though they are here); the point is that regardless of correlation.
Software QA (bug-finding when each test catches a random bug), cache eviction analysis (how long until all cache lines have been accessed), and biological diversity estimation (how many species have been sampled after n observations) all map onto variants of the coupon collector problem.
RELATED PROBLEMS
Practice variants of this problem
400+ problems on QuantPrep, including variants of the coupon collector problem and every technique it draws on.
Start practicing free