Show HN: Dicta.to – Local voice dictation for Mac with on-device AI
alamparelli Tuesday, February 24, 2026I built a macOS dictation app where everything runs on-device. Transcription, auto-correct, translation. No audio or text leaves your machine.
It ships with 4 transcription engines you can swap between: WhisperKit (99 languages), NVIDIA Parakeet TDT 0.6B (25 European languages, fastest of the bunch), Qwen3-ASR 0.6B (30 languages), and Apple Speech on macOS 26+. They all run through CoreML/Metal. Whisper is the most versatile, Parakeet wins on raw latency for European languages, Qwen3 does better with CJK. I went with a protocol-based architecture so you pick the engine that fits your use case instead of me pretending one model rules them all.
After transcription, there's an optional post-processing pipeline using Apple Intelligence (FoundationModels framework, macOS 26+, also fully on-device): auto-correct with filler word removal, tone rewriting, translation. The annoying part was FoundationModels cold start. First inference after idle takes 2-3s, which kills the experience. I worked around it by firing a throwaway mini-inference (`session.respond(to: "ok")`) in parallel while audio is still being transcribed, so the model is already warm when the text arrives. Hacky, but it shaved off the perceived latency.
Getting transcribed text into any arbitrary macOS app was honestly the hardest part. I use clipboard save/restore: read all NSPasteboard types (not just strings, also images, RTF, whatever the user had copied), write the transcribed text, simulate Cmd+V via CGEvent posted to `cghidEventTap`, then restore the original clipboard. Electron apps are slower to process paste events, so I detect them by checking if `Contents/Frameworks/Electron Framework.framework` exists in the app bundle and add extra delay. This whole approach requires Accessibility permissions, which means no sandbox, which means no App Store. I'm fine with that trade-off.
Built this solo in about 6 weeks. One-time purchase, no subscription.
I'm genuinely unsure about the multi-engine approach. Is letting users choose between Whisper/Parakeet/Qwen3 useful, or would most people prefer I just auto-select based on their language? Also curious if anyone has a cleaner approach to text injection on macOS. The clipboard hack works everywhere but it feels fragile and I don't love it.