Hazumi News | Show HN: Dicta.to – Local voice dictation for Mac with on-device AI

I built a macOS dictation app where everything runs on-device. Transcription, auto-correct, translation. No audio or text leaves your machine.

It ships with 4 transcription engines you can swap between: WhisperKit (99 languages), NVIDIA Parakeet TDT 0.6B (25 European languages, fastest of the bunch), Qwen3-ASR 0.6B (30 languages), and Apple Speech on macOS 26+. They all run through CoreML/Metal. Whisper is the most versatile, Parakeet wins on raw latency for European languages, Qwen3 does better with CJK. I went with a protocol-based architecture so you pick the engine that fits your use case instead of me pretending one model rules them all.

After transcription, there's an optional post-processing pipeline using Apple Intelligence (FoundationModels framework, macOS 26+, also fully on-device): auto-correct with filler word removal, tone rewriting, translation. The annoying part was FoundationModels cold start. First inference after idle takes 2-3s, which kills the experience. I worked around it by firing a throwaway mini-inference (`session.respond(to: "ok")`) in parallel while audio is still being transcribed, so the model is already warm when the text arrives. Hacky, but it shaved off the perceived latency.

Getting transcribed text into any arbitrary macOS app was honestly the hardest part. I use clipboard save/restore: read all NSPasteboard types (not just strings, also images, RTF, whatever the user had copied), write the transcribed text, simulate Cmd+V via CGEvent posted to `cghidEventTap`, then restore the original clipboard. Electron apps are slower to process paste events, so I detect them by checking if `Contents/Frameworks/Electron Framework.framework` exists in the app bundle and add extra delay. This whole approach requires Accessibility permissions, which means no sandbox, which means no App Store. I'm fine with that trade-off.

Built this solo in about 6 weeks. One-time purchase, no subscription.

I'm genuinely unsure about the multi-engine approach. Is letting users choose between Whisper/Parakeet/Qwen3 useful, or would most people prefer I just auto-select based on their language? Also curious if anyone has a cleaner approach to text injection on macOS. The clipboard hack works everywhere but it feels fragile and I don't love it.

Summary

Dicta.to is a platform that allows users to create and share short-form audio content, providing an easy-to-use interface for recording, editing, and publishing audio snippets. The platform aims to make audio sharing more accessible and interactive for both creators and listeners.

Story

Show HN: Dicta.to – Local voice dictation for Mac with on-device AI