What is a voice AI interviewer?

A voice AI interviewer is a software system built to run a natural spoken conversation: a candidate speaks, the system listens, takes its turn, and then asks a follow-up question against a defined objective. Unlike one-way video tools — which simply record candidates monologuing into prompts — a voice AI interviewer actually converses back. The difference from chatbot prep tools is the modality: voice instead of text, turn-taking instead of streaming, and the pressure of a real interview rather than a typed thread.

An important caveat: a voice AI interviewer is not a decision maker. It is an instrument for collecting evidence — turning a candidate's answers into structured evidence around a rubric and handing it to a human reviewer. The decision always belongs to a human.

How GAIA works under the hood

GAIA is built on the standard real-time voice-agent architecture with three primary components: speech-to-text (Whisper-grade models) to turn audio into transcript, an LLM to drive next-question selection and evaluation, and ElevenLabs Text to Speech to produce the human-like spoken response. On top of that, we added turn-taking and barge-in detection orchestration — logic that predicts when a candidate is done, pauses for mid-thought silences, and gracefully cuts off mid-sentence when the candidate barges in.

This approach is not new. Apna, an India-based career platform, reports running over 1.5 million AI interviews and 7.5 million voice minutes on top of ElevenLabs, with end-to-end response time around 300 ms.[1] Bolna reports that 90% of paying customers default to ElevenLabs as the TTS provider and that candidates who stay on a call past 60 seconds finish the interview 95% of the time.[2] Maki People runs the same architecture for large chains like TRG Wagamama, PwC, and H&M and reports higher completion rates plus stronger candidate signal.[3]

Two things make GAIA distinct. First, our use case is single-focused — we are a structured interviewer, not a general-purpose outbound caller. That lets us tightly optimize prompts, mode, and rubric scoring. Second, the under-the-hood evidence stack is built to map to EU AI Act deployer obligations: every transcript, every score, and every human review step is persisted.

Why voice beats one-way video

Candidates dislike one-way video interviews. They feel impersonal, drop-off is high, and a timestamped recording does not convey the same signal as a real exchange. Voice AI interviewers do better because they replicate, beat by beat, the smoothness of an actual conversation.

Signal	Voice AI	One-way video
Completion rate	High (~95% past 60s)[2]	Typically lower
Fairness signal	Same questions, same rubric, real follow-ups	Same questions but no follow-up
Candidate sentiment	Warmer; feels like real conversation[3]	Cold; monologuing into a recorder
Time-to-results	Instant	Waits on recruiter review

When NOT to use voice AI interviews

Be honest about this: voice AI interviewers are not the right answer for every situation. In sensitive, regulated fields — clinical decisions, legal-process testimony, interviews involving children or vulnerable groups — do not use voice AI as the sole tool. Do not force candidates who prefer a written alternative into voice; you must give them a human review path under the EU AI Act. For candidates with specific disabilities (e.g. significant hearing or speech impairment), an accommodated format run by a specialist is the more evidence-rich choice.

Our general rule of thumb: use voice AI for structured interviews, screening stages, and at-scale candidate signal; route high-stakes, sensitive cases to human panels.

Get started

Try GAIA in the browser via the demo, or hop straight to the free candidate practice mode. Are you a hiring manager? Read our pricing and the EU AI Act compliance page.

Voice AI Interviewer: GAIA

What is a voice AI interviewer?

How GAIA works under the hood

Why voice beats one-way video

When NOT to use voice AI interviews

Get started

References

Configure one interview. Run them all.