Story

Show HN: ngrep – grep plus word embeddings (Rust)

xnan Saturday, March 14, 2026

I got curious about a simple question: regular expressions are purely syntactic, but what happens if you add just a little bit of semantics?

To answer, I ended up building ngrep: a grep-like tool that extends regular expressions with a new operator ~(token) that matches a word by meaning using word2vec-style embeddings (FastText, GloVe, Wikipedia2Vec).

A simple demo: "~(big)+ \b~(animal;0.35)+\b" over Moby-Dick can find many ways used to refer to a large animal, surfacing "great whale", "enormous creature", "huge elephant" and so on. Pipe it through sort | uniq -c and the winner is, unsurprisingly, "great whale" :)

Built in Rust on top of the awesome fancy-regex, and ~() composes with all standard operators (negative lookahead, quantifiers, etc.). Currently a PoC with many missing optimizations (e.g: no caching, no compilation to standard regex, etc.), obviously without the guarantees of plain regex and subject to the limits of w2v-style embeddings...but thought it was worth sharing!

Summary
ngrep is a command-line tool that allows users to search for and display network traffic that matches a specific pattern. It provides a flexible and powerful way to monitor and analyze network activity, making it a valuable tool for network administrators, security professionals, and developers.
3 2
Summary
github.com
Visit article Read on Hacker News Comments 2