Show HN: I trained a 9M speech model to fix my Mandarin tones
simedw Saturday, January 31, 2026Built this because tones are killing my spoken Mandarin and I can't reliably hear my own mistakes.
It's a 9M Conformer-CTC model trained on ~300h (AISHELL + Primewords), quantized to INT8 (11 MB), runs 100% in-browser via ONNX Runtime Web.
Grades per-syllable pronunciation + tones with Viterbi forced alignment.
Try it here: https://simedw.com/projects/ear/
Summary
The article explores a method using Connectionist Temporal Classification (CTC) to improve the pronunciation of English words with the 'ear' sound. The technique involves training a neural network model to accurately generate the correct pronunciation, which could be useful for language learning and speech technology applications.
129
42
Summary
simedw.com