Hazumi News | Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c

Andrej Karpathy showed us the GPT algorithm. I wanted to see the hardware limit.

The Punchline: I made it go 4,600x faster in pure C code, no dependencies and using a compiler with SIMD auto-vectorisation!!!

Andrej recently released microgpt.py - a brilliant, atomic look at the core of a GPT. As a low-latency developer, I couldn't resist seeing how fast it could go when you get closer to the metal.

So just for funzies, I spent a few hours building microgpt-c, a zero-dependency and pure C99 implementation featuring:

- 4,600x Faster training vs the Python reference (Tested on MacBook Pro M2 Max). On Windows, it is 2,300x faster. - SIMD Auto-vectorisation for high-speed matrix operations. - INT8 Quantisation (reducing weight storage by ~8x). Training is slightly slower, but the storage reduction is significant.

- Zero Dependencies - just pure logic.

The amalgamation image below is just for fun (and to show off the density!), but the GitHub repo contains the fully commented, structured code for anyone who wants to play with on-device AI.

I have started to build something useful, like a simple C code static analyser - I will do a follow-up post.

Everything else is just efficiency... but efficiency is where the magic happens

Summary

MicroGPT-C is a lightweight, customizable, and efficient implementation of the GPT language model in C. It aims to provide a high-performance and resource-efficient solution for deploying large language models on a variety of platforms, including embedded systems and edge devices.

Story

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster