Show HN: Splintr – Rust BPE tokenizer, 12x faster than tiktoken for batches
fs90 Thursday, November 27, 2025Hi HN,
I built Splintr, a BPE tokenizer in Rust (with Python bindings), because I found existing Python-based tokenizers were bottlenecking my data processing pipelines.
While OpenAI's tiktoken is the gold standard for correctness, I found I could get significantly better throughput on modern multi-core CPUs by rethinking how parallelism is applied.
Splintr achieves ~111 MB/s batch throughput (vs ~9 MB/s for tiktoken).
The Design Choice: "Sequential by Default" One of the most interesting findings during development was that naive parallelism actually hurts performance for typical LLM inputs. Thread pool overhead is significant for texts under 1MB.
I implemented a hybrid strategy:
Single Text (encode): Purely sequential. It’s 3-4x faster than tiktoken simply by using pcre2 with JIT instead of standard regex handling.
Batch Processing (encode_batch): Parallelizes across texts using Rayon, rather than within a text. This saturates all cores without the overhead of splitting small strings.
Other Features:
Safety: Strict UTF-8 compliance, including a streaming decoder that correctly buffers incomplete multi-byte characters.
Compatibility: Drop-in support for cl100k_base (GPT-4), o200k_base (GPT-4o), and llama3 vocabularies.
The repo is written in Rust with PyO3 bindings. I’d love feedback on the implementation or other potential optimization tricks for BPE.
Thanks!