Voice AI Interviewer

Quick definition

A voice AI interviewer is software that conducts hiring interviews via real-time spoken conversation, using STT + LLM + TTS in a low-latency loop. It includes turn-taking and barge-in detection so the dialogue feels natural.

How it works

The core loop is the same as an agentic interview: STT → LLM → TTS. The voice version adds three layers. Turn-taking detection predicts in real time whether the candidate has finished a sentence — without it, the system either interrupts or leaves long pauses. Echo cancellation prevents speaker output from being re-ingested by the microphone. Accent robustness comes from STT models fine-tuned on multilingual and accented data. Together these layers make the interaction feel close to a phone screen, which is materially different from a typed flow.

Why it matters

Completion rates are higher than typed or video formats — talking is less friction for most candidates than typing. Signal quality is close to a human phone screen, since pauses, hesitation, and natural disfluency reach the assessment. Multilingual support is comparatively cheap: when the TTS model adds a language, so does the platform — no extra script authoring needed.

Related terms

Frequently asked

Try a voice interview with GAIA on your own role.

In a few minutes you can run the real candidate flow and see how the transcript is scored.