Story

Show HN: I trained a 9M speech model to fix my Mandarin tones

simedw Saturday, January 31, 2026

Built this because tones are killing my spoken Mandarin and I can't reliably hear my own mistakes.

It's a 9M Conformer-CTC model trained on ~300h (AISHELL + Primewords), quantized to INT8 (11 MB), runs 100% in-browser via ONNX Runtime Web.

Grades per-syllable pronunciation + tones with Viterbi forced alignment.

Try it here: https://simedw.com/projects/ear/

Summary
The article explores a method using Connectionist Temporal Classification (CTC) to improve the pronunciation of English words with the 'ear' sound. The technique involves training a neural network model to accurately generate the correct pronunciation, which could be useful for language learning and speech technology applications.
129 42
Summary
simedw.com
Visit article Read on Hacker News Comments 42