๐The Metrics Interview
'What's the right metric for X?' or 'Diagnose this metric anomaly.' Tests data fluency under pressure.
Metrics interviews are where most candidates lose points without realizing. The questions test whether you can think structurally about cause-and-effect, hypothesize like a scientist, and slice data effectively.
Metrics interviews split into two types: (1) success metric definition โ 'how would you measure success of X?', and (2) diagnosis โ 'metric Y dropped 10%, what's happening?'. Both have a structured approach: clarify scope, frame the metric tree, generate hypotheses, prioritize investigation, propose a fix.
Type 1: Success metric definition
Prompt: "How would you measure the success of [feature]?"
Structure (10-15 min):
- Clarify scope. Is this measuring user value, business value, or both? Time horizon?
- Goal hierarchy. Identify the company goal โ product goal โ feature goal.
- Primary metric. One number that captures success. Tie to the goal.
- Guardrail metrics. 2-3 metrics that mustn't degrade.
- Leading + lagging. Identify the leading indicator you can move now.
- Counter-metric. What metric would tell you the primary is being gamed?
- Cadence. Daily, weekly, quarterly?
Example: 'Measure success of Instagram Stories.'
- Primary: Stories posted per active user per week (engagement + value)
- Guardrails: feed engagement (mustn't cannibalize), session length, NPS
- Leading: stories viewed (precedes posting)
- Counter: stories posted per active user โ if it's only growing through bots/spam, low quality stories will grow without time-on-platform growth
Type 2: Diagnosis
Prompt: "DAU dropped 10% last week. What happened?"
Structure (15-20 min):
- Clarify. Is it sudden or gradual? Just one product or all? Internal data accuracy?
- Hypothesis tree. External (market, competitor, news, season) vs Internal (product change, bug, marketing change, pricing).
- Prioritize hypotheses by likelihood. Recent product release? Most likely. Competitor launch? Possible. Internal data bug? Always rule out first.
- Investigation order. Start with cheapest, highest-likelihood. "Step 1: confirm data isn't broken. Step 2: check recent releases. Step 3: slice by platform and geography."
- Slicing. Geography, platform, user segment, channel, cohort. Patterns emerge from slices.
- Propose fix. Based on the most likely cause.
The AARM framework (some PMs use)
- Audit the metric (clear definition, instrumentation OK?)
- Activities (what user behaviors drive it)
- Refinement (what could improve it)
- Measurement (what we'd track to know)
What separates A from B
- A: Hypothesis-driven. Lists 5+ specific hypotheses, prioritizes investigation.
B: Random investigation. Jumps to 'I'd look at the data.'
- A: Considers data quality first. Rules out the instrumentation bug.
B: Assumes data is right.
- A: Slices methodically. Geography, platform, segment, cohort.
B: Looks at aggregate.
- A: Names guardrails on success metric. Knows the primary alone can be gamed.
B: One metric, no guardrails.
Common prompts to practice
- How would you measure success of Google Search?
- DAU is up 15%, revenue is flat. What's happening?
- Activation rate dropped 5%. Diagnose.
- How would you measure success of Stories on Instagram?
- The check-out conversion rate fell 8% on Tuesday. Walk me through.
- Define a North Star metric for Spotify.
Practice 20+ before interviewing.
Watch-outs
- Don't pick one metric without guardrails. Always 2-4 metrics in concert.
- Don't skip data quality. Always rule it out first in diagnosis.
- Don't go too detailed on SQL. You're a PM, not a data engineer.
- Show structure. Even if the interviewer interrupts, the structure earns points.
Key frameworks
Three-metric framework for success measurement: one primary, 2-3 guardrails, 1 counter-metric.
Split hypotheses into external (market) and internal (product). Investigate internal first.
Real-world examples
Both Meta and Google use metric-diagnosis questions to test analytical structure. Candidates who methodically slice data and prioritize hypotheses by likelihood significantly outperform those who jump to conclusions.
Go deeper โ recommended reading
Interview questions (2)
Q1DAU on our app dropped 8% last Tuesday. Walk me through how you'd diagnose.metricsmidโผ
Six-step diagnostic.
Step 1: Clarify. Sudden drop on Tuesday only or gradual? Just our app or industry-wide? Confirmed by both internal analytics and any 3rd-party tool we use?
Step 2: Rule out data issue. Check if our event pipeline had any incident on Monday/Tuesday. Compare against an independent metric (server logs of user authentication). If our DAU dropped but auth attempts stayed flat, the DAU metric is broken, not the user behavior.
Step 3: Hypothesis tree.
- Internal: recent app release (check Monday deploys), marketing campaign cut, paywall change, bug, push notification issue.
- External: holiday in major market, news event distracting users, competitor launch, app store ranking change.
Step 4: Slice.
- By platform (iOS vs Android โ Android often has more variability)
- By geography (was it US only? specific country?)
- By acquisition channel (organic vs paid)
- By cohort (new vs returning users)
The pattern in the slice usually reveals the cause. Drop only on iOS new users? Probably an onboarding bug in the Monday release.
Step 5: Validate hypothesis. If iOS new-user activation dropped, look at the funnel. Where exactly are they dropping out? Match to a code change.
Step 6: Propose fix. Rollback the offending release. Investigate root cause. Add monitoring to catch similar regressions before they hit production.
If the slice doesn't reveal an internal cause, expand to external โ news cycle, weather, holiday calendar. Sometimes the answer is genuinely 'a major news event distracted users for a day.'
Q2How would you measure success for Instagram Stories?metricsmidโผ
Goal hierarchy. Company goal: time spent in Instagram. Product goal: keep users engaged daily. Feature goal: provide a casual, low-stakes way to share daily life.
Primary metric. Stories posted per weekly active user. Captures the core 'share casually' behavior.
Guardrails.
- Feed engagement (Stories must not cannibalize feed time)
- Session length (overall app usage should grow, not shrink)
- Story consumption rate (posts without views are worthless)
Leading indicator. Stories viewed per weekly active user (precedes posting โ users have to see Stories regularly to start posting).
Counter-metric. Average watch time per Story (catches the case where users post lots of low-quality stories that no one watches).
Cadence. Weekly review of primary; daily monitoring of guardrails to catch regressions.
The non-obvious move: I'd also track Stories posted per Story-creator (excluding lurkers). The aggregate primary metric can grow because the creator base grew, even if creators are posting less. Slicing by creator vs viewer cohort gives a more honest picture.
For long-term success: Stories' contribution to overall time-in-app and retention of weekly active users. If Stories engagement is up but app retention is flat, we're cannibalizing not growing.