Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"
shubham-coder Saturday, February 07, 2026I built a Voice agent platform my drobotics lab of my university..which is already being cloned by 330+ people within 12hrs .. I am a first year cse student and so I tried to figure out a way to actually run everything on my laptop and working on it currently to completely transform to edge ai voice assistants for the robotics and 100% private and local control of robotics related project of my lab..
The intersting features are : 1> I used json rag with real time embeddings so that for a few specs and info we don't need to set a whole pipeline..
I have already built " Hierarchical Agentic Rag with Hybrid Search ( knowledge graph + vector search) u can view that on my profile ...
I am actively trying to share as much as possible related to it but that project is actually linked with a huge set of files it's 693k points of data with pgvector+ postgress .. give a visit u will get more idea from that
2> I had tried every sort of whisper models.. faster whisper .. turbo or anything u can u think of ..even with a self c++ engine .. but that model itself was hallucintion prone architecture..
Then I moved to parakeet tdt with silero vad and not parakeet rnn for better speed and optimisations .. repo has further details ..
3> fine tuned a dataset from anthropic rlhf through space and glinner and convert that to a perfect training dataset of the Lama 3.2 3b ..
I will attach the dataset of u need or will upload that to hugging face if u want to use it for yourself..
4> attached phonetic correctors for both output from parakeet and llama for better tts working .
5> I used setfit to route the queries and confidence based semantic search for faster and accurate as much as possible
6> I am using sherpa onxx and qued the tts and stt and everything but as a experimentation I have also achieved llama generating respond and kokora processing as a batch with a full nyc working as well and everything on my laptop...
7> along with these my frontend also relies on heavy three.js and 3d view files but I had applied optimisations there which works perfectly with everything together on the laptop..
8> I also applied glued interaction to the llm model .. implemented FIFO with 5 interactions and storing them for future fine tuning and phonetic words additions.
Pls give a visit it and let me know if I should learn something new ..
One kind note : as a enthusiast spending so much energy on these things things .. I have taken help from ai for the md files and expansion or explanations in the codes for better help of every single person...