Utterance works with your existing voice stack. It handles when to act — not what to transcribe or say.
With Whisper (Speech-to-Text)
import { Utterance } from "@utterance/core";
import { transcribe } from "./whisper";
const detector = new Utterance();
let audioBuffer = [];
detector.on("speechStart", () => {
audioBuffer = [];
});
detector.on("turnEnd", async (result) => {
if (result.confidence > 0.8) {
const text = await transcribe(audioBuffer);
// Send to your LLM
}
});
await detector.start();With OpenAI Chat
detector.on("turnEnd", async (result) => {
const transcript = await getTranscript();
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: transcript }],
});
speak(response.choices[0].message.content);
});
detector.on("interrupt", () => {
stopSpeaking(); // Halt TTS immediately
});With Any TTS
Utterance pairs naturally with any text-to-speech system. Use the interrupt event to halt playback when the user wants to speak:
detector.on("interrupt", () => {
audioElement.pause();
audioElement.currentTime = 0;
});