The format spectrum
Setting aside chat-style screeners, hiring conversations live on a spectrum from highest-touch to lowest:
- In-person panel. Highest signal, highest cost, highest time burden, geographic constraint.
- Live video (Zoom panel). Synchronous human interview. Captures verbal and most visual content. Now the dominant mid-funnel format for tech roles.
- Phone screen / live voice. Synchronous, voice-only. Lower bandwidth, lower bias surface, lower drop-off than async.
- Voice AI interview. Asynchronous in the sense that no human is on the other end, but the conversation is real-time and adaptive. The AI asks, listens, probes, scores.
- One-way (asynchronous) video. Candidate records answers to fixed prompts; recordings are reviewed later by humans or scored by AI.
- AI video assessment. Same as one-way video, but with automated scoring, sometimes including facial analysis (now largely retreating; see below).
Drop-off rates by format
The most rigorous recent number comes from a March 2026 field experiment by Avery, Ip, Leibbrandt, & Vecci. Over 3,000 real applicants were randomized across asynchronous audio, asynchronous video, live online interview, and a control group. The headline result: asynchronous interviews caused an over 50% decrease in application continuation, including among the most qualified applicants, and the decline was largest for women.[3]
The mechanism was not technical friction. A complementary vignette experiment showed the deterrence was driven by perceptions about the competitiveness and fairness of the recruitment process — applicants interpreted async as a signal that the employer cared less about them. That reading aligns with qualitative work showing candidates perceive async video interviews as “impersonal and mechanical” due to the lack of real-time interaction.[4]
Voice AI sits in an interesting place on this spectrum. It is asynchronous in scheduling — the candidate runs the interview when they want — but the conversation itself is real-time. The early data suggests it does not produce the same drop-off cliff that one-way video does, because the felt experience is closer to a live phone screen than to a video confessional.
What signals each format captures
| Signal | Voice AI | One-way video | Live video |
|---|---|---|---|
| Verbal content (what they said) | Yes | Yes | Yes |
| Paralinguistic (pace, fluency) | Yes | Yes | Yes |
| Real-time adaptation (probes) | Yes | No | Yes |
| Visual cues (face, dress, room) | No | Yes | Yes |
| Re-record / impression management | No | Often allowed | No |
The visual channel looks like an asset — until you ask whether the marginal predictive validity from visual cues is large enough to justify the demographic correlation it carries.
Where bias enters: the visual channel
Asynchronous video puts the candidate’s home environment on camera. Tilburg University researchers studied this specifically, coding attire, room tidiness, technical issues, and visible background elements across mock and high-stakes async interviews. They found that completion decisions varied with stakes, that recording-quality issues were rare but modestly biasing, and that standardised evaluation reduced sex-based bias but not other interviewee-characteristic bias.[5]
The conceptual framing comes from Davis (2022), whose model of async video design identifies the candidate’s pre-interview decisions — choice of location, lighting, attire — as causally upstream of evaluator bias.[6] The mechanism is not exotic: humans are visual animals, and asking us to ignore a candidate’s background while we listen to their answer is asking against cognitive grain.
Voice AI does not eliminate bias — accent, fluency, and prosody remain — but it removes the visual surface entirely. The home-environment problem disappears because the home environment is not in the input.
The HireVue facial-analysis episode
The most consequential public moment in this debate happened on 12 January 2021. HireVue, then the dominant async video assessment vendor, announced that it would stop using visual analysis in its pre-hire algorithms. The announcement was the resolution of a fourteen-month controversy that began with a November 2019 EPIC complaint to the FTC arguing that HireVue’s use of opaque facial-analysis algorithms constituted unfair and deceptive trade practices.[1]
HireVue’s own framing in their public statement is worth quoting: their internal research had concluded that “for the significant majority of jobs and industries, visual analysis has far less correlation to job performance than other elements of our algorithmic assessment” and that NLP advances meant the marginal predictive lift from non-verbal data was negligible.[2]
Read carefully, this is a vendor-led admission that the visual channel was carrying mostly noise — and the noise was demographically correlated. It is the cleanest data point in the literature for the claim that voice-only is not just defensible on fairness grounds; it is also defensible on signal grounds.
Where voice AI sits on the fairness frontier
A useful frame: every selection method has a position on the two-dimensional space of (predictive validity) × (adverse impact). Cognitive ability tests sit high on validity but produce large mean subgroup differences along racial lines. Unstructured interviews sit low on validity but also produce smaller measured subgroup differences (mostly because the noise dominates the signal). Structured interviews sit on a defensible point on the frontier: high validity, lower subgroup differences. Voice AI inherits the structured interview’s position because the underlying measurement instrument is a structured-interview rubric — minus the visual surface that drags one-way video off the frontier.
Candidate experience: the hidden cost of one-way video
The empirical research on async video makes a consistent argument that is hard to ignore. Candidates describe the experience as impersonal and mechanical; the lack of real-time interaction eliminates the relationship-building that the same candidate gets on a phone screen.[4] The experimental work by Avery and colleagues found that drop-off was driven by perceptions of fairness — the candidate is implicitly told “you are not worth a real conversation,” and signals back by leaving.[3]
This effect compounds at the top of the funnel for sought-after candidates. A senior software engineer with three competing offers will not film themselves answering a one-way prompt — they will ghost the recording and take the company that gave them a phone call. The drop-off rate is not random; it is correlated with candidate seniority and market alternatives, which means the format selectively removes your most desirable candidates from the funnel. That is a self-defeating shape for a hiring tool.
Voice AI does not fully solve this — talking to an AI is still not the same felt experience as a human interviewer — but the conversational element is real-time, adaptive, and bidirectional. The candidate can clarify, the AI can probe, and the rhythm resembles a phone screen. Pilot data on agentic voice interviews consistently shows higher completion rates than async video, though the rigorous published comparison study is still in preprint stage.
A practical framework for choosing format per role
A useful decision tree, in priority order:
- Does the job require physical presence on camera? (Brand ambassador, on-camera media, customer-facing retail.) If yes, video is part of the job — async video is defensible. If no, proceed.
- Is timezone synchrony achievable? If yes, prefer live (voice or video) for the highest-stakes step. Async should be reserved for early funnel.
- What is the funnel volume? If you screen 5,000 candidates for 50 hires, you cannot afford live for screening; the question is voice AI vs one-way video for the screen step. Voice AI wins on drop-off, fairness, and candidate experience; one-way video wins only on review-flexibility.
- Compliance jurisdiction? Under EU AI Act Annex III, both formats are high-risk hiring AI; the format does not change the regulatory burden, but it changes the demographic-disparity risk you are managing.
Frequently asked questions
Sources
- [1]EPIC — HireVue, facing FTC complaint from EPIC, halts use of facial recognition (12 January 2021). https://epic.org/hirevue-facing-ftc-complaint-from-epic-halts-use-of-facial-recognition/
- [2]HireVue blog — Industry leadership: new audit results, decision on visual analysis (12 January 2021). https://www.hirevue.com/blog/hiring/industry-leadership-new-audit-results-and-decision-on-visual-analysis
- [3]Avery, Ip, Leibbrandt, & Vecci (2026). A Brave New World of Hiring: A Natural Field Experiment on How Asynchronous Interviews and AI Assessment Reshape Recruitment. Working paper. https://ideas.repec.org/p/exe/wpaper/2602.html
- [4]Opportunities and challenges of asynchronous video interviews — PLOS ONE, qualitative HR study. https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0325932
- [5]Tilburg University — Assessing biasing factors in asynchronous video interviews: applicant completion decisions, video background, and evaluation format. https://research.tilburguniversity.edu/en/publications/assessing-biasing-factors-in-asynchronous-video-interviews-applic/
- [6]Davis, F. D. (2022). Into the void: A conceptual model and research agenda for the design and use of asynchronous video interviews. Human Resource Management Review. https://www.sciencedirect.com/science/article/abs/pii/S1053482220300620
