QuantPrep
QUANT INTERVIEW PROBLEM·expected-value·difficulty 3

The Coupon Collector Problem

Expected number of draws to collect all n types. The answer is n · H_n ≈ n ln n.

also known as: expected draws to collect all types · cereal box problem

PROBLEM

You draw uniformly at random (with replacement) from n distinct coupon types until you have seen all n. What is the expected number of draws?

ANSWER

n · H_n = n · (1 + 1/2 + 1/3 + ... + 1/n) ≈ n ln n.

CANONICAL SOLUTION

Break the process into n stages, where stage k is the wait for the k-th distinct coupon to appear (after k-1 have already been collected). At stage k, the probability that a single draw produces a new coupon is (n - k + 1)/n. Therefore, the number of draws at stage k is geometric with parameter p = (n - k + 1)/n, giving:

E[stage k]=nnk+1\mathbb{E}[\text{stage } k] = \frac{n}{n - k + 1}

Summing:

E[total]=k=1nnnk+1=nj=1n1j=nHn\mathbb{E}[\text{total}] = \sum_{k=1}^{n} \frac{n}{n - k + 1} = n \sum_{j=1}^{n} \frac{1}{j} = n \cdot H_n

For large n, this is approximately nlnn+γnn \ln n + \gamma n, where γ0.577\gamma ≈ 0.577 is the Euler-Mascheroni constant.

AI ALTERNATIVE FRAMING

Think of it as a race against diminishing returns. Early on, every draw is likely to be a new coupon (high variety). As your collection grows, you're increasingly wasting draws on duplicates. The geometric expectation at each stage captures this: the first coupon takes 1 draw (certain), the second averages n/(n-1) draws, and the last takes n draws on average (only a 1/n chance of being new). The harmonic sum HnH_n emerges naturally — it's the sum of the reciprocals of the stage success probabilities.

On QuantPrep, every practice problem gets an AI-generated alternative explanation like this — tailored to the canonical solution, streaming live.

COMMON VARIATIONS

Variance

What is the variance of the coupon collector time?

Since stages are independent geometrics, $\text{Var}[\text{total}] = \sum_{k=1}^{n} \frac{k-1}{n} \cdot \left(\frac{n}{n-k+1}\right)^2$. For large n, this is approximately $\frac{\pi^2}{6} n^2$, dominated by the final (rare-coupon) stages.

Weighted coupons

Each coupon type has a different probability pᵢ (not uniform). What is the expected time to collect all?

No closed form in general; use inclusion-exclusion: $\mathbb{E}[T] = \int_0^\infty (1 - \prod_i (1 - e^{-p_i t})) dt$.

Collecting k out of n

Expected draws to collect any k specific types, where k < n.

Analogous stage decomposition: $n \sum_{j=n-k+1}^n 1/j$, which is smaller than the full collector time by the missing tail of the harmonic sum.

FIRMS THAT ASK THIS

TECHNIQUE

Expected Value

The single most common technique in quant interviews. Every problem reduces to it eventually.

FREE QUANT BRAINTEASERS WEEKLY

Not ready to sign up? Get 5 hand-picked quant interview problems every week. No spam, unsubscribe any time.

TRY A PROBLEM NOW · NO SIGNUP

#042 · optimal_stoppingdifficulty 4

In the classical secretary problem, as n → ∞, what fraction should you reject outright before accepting the next best-so-far candidate?

LIVE DEMO · NO SIGNUP

FAQ

Why is the final stage so slow?

Because by the time you've collected n-1 types, only 1 in n draws gives you the missing last type. That's a geometric wait with expected value n. Most of the total time is spent on the last few coupons.

How does this connect to linearity of expectation?

The decomposition into n independent geometric stages is a direct application of linearity. You don't need the stages to be independent to use linearity (though they are here); the point is that E[sum of stages]=sum of E[stage]\mathbb{E}[\text{sum of stages}] = \text{sum of } \mathbb{E}[\text{stage}] regardless of correlation.

What's the real-world version of this?

Software QA (bug-finding when each test catches a random bug), cache eviction analysis (how long until all cache lines have been accessed), and biological diversity estimation (how many species have been sampled after n observations) all map onto variants of the coupon collector problem.

RELATED PROBLEMS

Practice variants of this problem

400+ problems on QuantPrep, including variants of the coupon collector problem and every technique it draws on.

Start practicing free