Show HN: ngrep – grep plus word embeddings (Rust)
xnan Saturday, March 14, 2026I got curious about a simple question: regular expressions are purely syntactic, but what happens if you add just a little bit of semantics?
To answer, I ended up building ngrep: a grep-like tool that extends regular expressions with a new operator ~(token) that matches a word by meaning using word2vec-style embeddings (FastText, GloVe, Wikipedia2Vec).
A simple demo: "~(big)+ \b~(animal;0.35)+\b" over Moby-Dick can find many ways used to refer to a large animal, surfacing "great whale", "enormous creature", "huge elephant" and so on. Pipe it through sort | uniq -c and the winner is, unsurprisingly, "great whale" :)
Built in Rust on top of the awesome fancy-regex, and ~() composes with all standard operators (negative lookahead, quantifiers, etc.). Currently a PoC with many missing optimizations (e.g: no caching, no compilation to standard regex, etc.), obviously without the guarantees of plain regex and subject to the limits of w2v-style embeddings...but thought it was worth sharing!