System case study / 2024
VAANI
Hindi-first fully offline voice assistant
activePython / Whisper / Qwen2.5-3B / Piper TTS / XTTS v2 / openWakeWord
Offline AIVoice SystemsHindiEdge Inference
Latency
<800ms
Wake to spoken response on CPU.
Network
0
No internet dependency.
Plugins
8
Layered extension architecture.
Motivation
Build a local voice assistant that keeps speech, reasoning, and synthesis offline while preserving practical latency on consumer CPU hardware.
Design Constraints
- Zero internet dependency.
- Hindi-first interaction loop.
- Consumer CPU latency target below one second.
- Modular plugin architecture without changing the core inference loop.
System Architecture
- openWakeWord detection triggers the pipeline.
- Whisper-small performs local ASR.
- Qwen2.5-3B-Instruct Q4_K_M handles local reasoning with 128K context.
- Piper TTS and fine-tuned XTTS v2 produce speech output.
- Eight-layer plugin architecture isolates tools, memory, routing, and generation.
Performance Bottlenecks
- ASR and TTS latency under CPU-only constraints.
- Hindi corpus quality for voice persona fine-tuning.
- Context management with local quantized model memory.
- Tool plugin boundaries in an offline runtime.
Optimization Decisions
- Use quantized local model execution.
- Fine-tune XTTS v2 on AI4Bharat Hindi corpus.
- Keep plugin interfaces thin and deterministic.
- Optimize each stage independently before end-to-end latency tuning.
Benchmark Methodology
- Measured wake-to-response end-to-end latency.
- Profiled ASR, LLM, and TTS stages separately.
- Tested offline operation with no network dependency.
- Validated new plugin integration without modifying core runtime.
Results
- Achieved under 800ms end-to-end latency on consumer CPU.
- Kept the entire voice pipeline offline.
- Supported modular plugin extension with stable core interfaces.
Lessons Learned
- Offline assistants are latency orchestration problems as much as model problems.
- Language-first UX changes tokenizer, ASR, TTS, and memory decisions.
- Local privacy constraints make deterministic system boundaries valuable.