Research

Structured Interviews: Why Schmidt & Hunter Still Matter in 2026

10 minIntrvio Team

Schmidt & Hunter (1998) gave us the most-cited validity table in industrial-organizational psychology: a structured interview correlates with job performance at r = .51, an unstructured interview at .38.[1] Sackett, Zhang, Berry, & Lievens (2022) re-ran the meta-analysis with a more conservative range-restriction correction and produced revised estimates roughly .10–.20 lower — but structured interviews emerged as the top-ranked single procedure even in the corrected ranking.[2][4]

The research is unambiguous. The practice gap is huge. This piece walks through what the numbers actually say, why unstructured interviews still dominate practice anyway, and what an AI interviewer changes about the operational equation.

The replication-crisis context, briefly

Most of psychology has spent the last decade re-running its core findings under stricter rules: pre-registered hypotheses, larger samples, more transparent analysis pipelines. Many cherished effects have shrunk or disappeared. So when an industrial-organizational psychologist points at a 1998 paper and says “structured interviews predict performance better than unstructured ones,” it is fair for an engineering leader to ask whether the result has held up.

The short answer is yes — with calibration. The Schmidt & Hunter (1998) numbers were already meta-analytic, summarising 85 years of research findings rather than relying on a single fragile study.[1][5] They have been re-examined twice in major published work since, and the rank order has held even when the absolute numbers have come down. Unlike the worst-hit areas of social psychology, hiring-validity research has weathered the audit.

Schmidt & Hunter (1998) — what they found

The original paper compiled correlations between selection procedures and supervisor performance ratings, corrected for measurement error and range restriction, across roughly 85 years of accumulated studies. The headline numbers from Table 1:[1]

Selection procedureValidity (r)
Work sample tests.54
General mental ability (GMA).51
Structured employment interview.51
Job knowledge tests.48
Integrity tests.41
Unstructured employment interview.38
Job experience (years).18
Years of education.10
Graphology.02

Two findings deserve emphasis. First, structured interviews tied with general mental ability tests at .51, dramatically outperforming unstructured interviews at .38 — a .13-point gap that translates into materially better hires across a large pipeline.[1] Second, the combination of GMA + structured interview reached a multivariate validity of .63, which is among the most predictive feasible hiring systems on record.

Sackett, Zhang, Berry & Lievens (2022) — the correction

In November 2022, the Journal of Applied Psychology published a 28-page re-examination of the 1998 estimates. The authors’ central claim: prior meta-analyses systematically over-corrected for range restriction (the statistical artefact where you only see post-selection data, so the variance is artificially compressed and correlations look bigger when you back-correct).[2][4]

They proposed a more conservative correction and reported revised validity estimates. The key results, in their Table 3:

  • Most procedures dropped by .10 to .20 points.
  • Structured interviews emerged as the top-ranked single selection procedure in the revised rankings.
  • GMA dropped more sharply than interviews because it had relied more heavily on the most aggressive range-restriction correction.
  • The validity-vs-diversity tradeoff was made explicit: structured interviews show smaller mean Black–White subgroup differences than cognitive ability tests, making them attractive on both predictive and adverse-impact grounds.[2]

The takeaway is calibration, not invalidation. If you used to quote .51 in deck slides, the defensible 2026 number is closer to .40–.44. Structured interviews still beat unstructured by a meaningful margin; the relative ordering is intact.

Why unstructured interviews still dominate practice

Despite three decades of research, most hiring still happens through unstructured panel conversations. There are three reasons, and only one of them is irrational.

  1. Confidence asymmetry.Hiring managers feel highly competent at “reading” candidates and rate themselves high on judgment. The research literature consistently shows interviewer self-assessment is uncorrelated with their actual predictive validity. This is the irrational reason; awareness alone does not fix it.
  2. Decentralization. Once a company has dozens of people running interviews, enforcing rubric discipline at scale is genuinely hard. People skip the script when they think they are building rapport. Structure decays unless someone owns enforcement.
  3. Increment masking. The .13-point validity gain only shows up across a large pipeline. For any individual hire it looks like a coin flip either way, so feedback never accumulates for the panel.

A counterintuitive result from Schmidt & Zimmerman (2004): three to four independent unstructured interviews can match a single structured interview’s validity, simply because aggregation averages out interviewer bias.[3] In other words, even panel-of-five-loose-interviews can reach the structured-interview floor. But the cost is high — five interviews of an hour each instead of one of forty-five minutes — and the candidate experience is worse. Structure is the cheaper path.

Behavioral vs situational — head-to-head

Within structured interviews there are two main flavours. Behavioral interviews ask about past behaviour: “Tell me about a time you had to handle a customer escalation that required a system change.” Situational interviews pose hypotheticals: “If a customer demanded a refund that violated our policy and threatened to go to social media, what would you do?”

The meta-analytic verdict: both work, with overlapping confidence intervals. Situational interviews edge ahead very slightly for entry-level roles where the candidate has limited past behaviour to draw on; behavioral interviews edge ahead very slightly for experienced hires. The lever that matters is whether the interview is structured at all, not which structure flavour you pick. Picking either is fine; mixing the two within the same interview is also fine if every candidate sees the same mix.

What AI interviewers actually change

The research case for structure has been settled for decades. The missing piece has always been operational: how do you actually run structured interviews at scale, with consistency across hundreds of panelists, without the rubric decaying into “close enough” ratings? AI interviewers solve a specific failure mode.

When the AI is asking the questions, the script does not drift. Every candidate hears the same opener, the same probes, the same follow-up logic. When the AI is scoring, the rubric does not soften into vibes — every answer maps to anchored behavioural exemplars (Behaviorally Anchored Rating Scales / BARS) with the same definitions across candidates. Cross-candidate comparisons become meaningful because the measurement instrument has been the same instrument the whole time.

That does not by itself raise validity above what a well-run human structured interview achieves. The gain is reliability and scale: you eliminate the “I’ll deviate from the script because I have a hunch” failure that is endemic in human panels. Combined with a downstream technical screen and a final human conversation, you reach the GMA + structured-interview multivariate that the 1998 and 2022 meta-analyses both rank at the top.

Common pitfalls when switching from unstructured to structured

  1. Writing too many questions. A 60-minute interview should have 5–8 substantive questions plus probes, not 15. More questions reduces depth per question and compresses scoring variance.
  2. Skipping the rubric design. The questions are the visible part; the scoring rubric is the load-bearing part. Without anchored examples per level, two interviewers will score the same answer differently and the structure collapses.
  3. Allowing “general impression” ratings. A composite “overall fit” column re-introduces the unstructured failure mode. The composite must be a transparent function of the rubric scores.
  4. Letting interviewers pick their own questions.If the question bank is “ask 3 of these 12,” you have a pseudo-structured interview where every candidate gets a different instrument. That collapses the comparison.
  5. Not training panel members on the rubric. Calibration sessions — three panelists rating the same recorded answer and discussing the difference — are the cheapest way to tighten inter-rater agreement.

Frequently asked questions

Sources

  1. [1]Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274. https://onthewards.org/wp-content/uploads/2016/09/The_Validity_and_Utility_of_Selection_Methods_in_Personnel_Psychology_-_Schmidt_KeyLIME.pdf
  2. [2]Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection: Addressing systematic overcorrection for restriction of range. Journal of Applied Psychology, 107(11), 2040–2068. https://gwern.net/doc/statistics/meta-analysis/2021-sackett.pdf
  3. [3]Schmidt, F. L., & Zimmerman, R. D. (2004). A counterintuitive hypothesis about employment interview validity and some supporting evidence. Journal of Applied Psychology. https://pubmed.ncbi.nlm.nih.gov/15161412/
  4. [4]Sackett, Zhang, Berry, & Lievens (2022) — full PDF mirror at SMU Knowledge. https://ink.library.smu.edu.sg/lkcsb_research/6894/
  5. [5]Schmidt & Hunter (1998) — APA PsycNet record (canonical citation). https://psycnet.apa.org/record/1998-10661-006

Intrvio platform

Run structured interviews at scale, with the rubric enforced.

GAIA asks the same questions in the same order with the same probes, and scores against an anchored rubric every time.