Pillar — voice AI

Voice AI Interviewer: GAIA

A real conversation. Not async videos, not chatbots. GAIA listens to candidates in real time, takes turns, and produces evidence-backed scorecards.

What is a voice AI interviewer?

A voice AI interviewer is a software system built to run a natural spoken conversation: a candidate speaks, the system listens, takes its turn, and then asks a follow-up question against a defined objective. Unlike one-way video tools — which simply record candidates monologuing into prompts — a voice AI interviewer actually converses back. The difference from chatbot prep tools is the modality: voice instead of text, turn-taking instead of streaming, and the pressure of a real interview rather than a typed thread.

An important caveat: a voice AI interviewer is not a decision maker. It is an instrument for collecting evidence — turning a candidate's answers into structured evidence around a rubric and handing it to a human reviewer. The decision always belongs to a human.

How GAIA works under the hood

GAIA is built on the standard real-time voice-agent architecture with three primary components: speech-to-text (Whisper-grade models) to turn audio into transcript, an LLM to drive next-question selection and evaluation, and ElevenLabs Text to Speech to produce the human-like spoken response. On top of that, we added turn-taking and barge-in detection orchestration — logic that predicts when a candidate is done, pauses for mid-thought silences, and gracefully cuts off mid-sentence when the candidate barges in.

This approach is not new. Apna, an India-based career platform, reports running over 1.5 million AI interviews and 7.5 million voice minutes on top of ElevenLabs, with end-to-end response time around 300 ms.[1] Bolna reports that 90% of paying customers default to ElevenLabs as the TTS provider and that candidates who stay on a call past 60 seconds finish the interview 95% of the time.[2] Maki People runs the same architecture for large chains like TRG Wagamama, PwC, and H&M and reports higher completion rates plus stronger candidate signal.[3]

Two things make GAIA distinct. First, our use case is single-focused — we are a structured interviewer, not a general-purpose outbound caller. That lets us tightly optimize prompts, mode, and rubric scoring. Second, the under-the-hood evidence stack is built to map to EU AI Act deployer obligations: every transcript, every score, and every human review step is persisted.

Why voice beats one-way video

Candidates dislike one-way video interviews. They feel impersonal, drop-off is high, and a timestamped recording does not convey the same signal as a real exchange. Voice AI interviewers do better because they replicate, beat by beat, the smoothness of an actual conversation.

SignalVoice AIOne-way video
Completion rateHigh (~95% past 60s)[2]Typically lower
Fairness signalSame questions, same rubric, real follow-upsSame questions but no follow-up
Candidate sentimentWarmer; feels like real conversation[3]Cold; monologuing into a recorder
Time-to-resultsInstantWaits on recruiter review

When NOT to use voice AI interviews

Be honest about this: voice AI interviewers are not the right answer for every situation. In sensitive, regulated fields — clinical decisions, legal-process testimony, interviews involving children or vulnerable groups — do not use voice AI as the sole tool. Do not force candidates who prefer a written alternative into voice; you must give them a human review path under the EU AI Act. For candidates with specific disabilities (e.g. significant hearing or speech impairment), an accommodated format run by a specialist is the more evidence-rich choice.

Our general rule of thumb: use voice AI for structured interviews, screening stages, and at-scale candidate signal; route high-stakes, sensitive cases to human panels.

Get started

Try GAIA in the browser via the demo, or hop straight to the free candidate practice mode. Are you a hiring manager? Read our pricing and the EU AI Act compliance page.


References

  1. [1] ElevenLabs — Apna scales 7.5 million AI interview minutes using ElevenLabs (Nov 2025).
  2. [2] ElevenLabs — Bolna powers recruitment voice agents with ElevenLabs (Jul 2025).
  3. [3] ElevenLabs — Maki People: Building the Future of AI-Driven Recruitment (Feb 2026).
  4. [4] ElevenLabs — Customer Stories.

Run interviews with voice AI

Configure one interview. Run them all.

Talk to GAIA for 5 minutes in demo mode. Then look at pricing.