Story

Subject-based weight routing for LLMs (27 days before DeepSeek Engram)

AutoJanitor Wednesday, January 21, 2026

I run LLM inference on an IBM POWER8 S824 with 576GB RAM – a $700 eBay server from 2014. In December 2025, I built "RAM Coffers" – banking model weights by subject domain with hot caching and resonance routing.

  On January 12, 2026, DeepSeek published "Engram" (arXiv:2601.07372) describing the same core idea: route queries to cached weight banks based on subject 
  matter.                                                                                                                                                  
                                                                                                                                                           
  The concepts are similar because I built it first. YouTube video from December 17, 2025: https://youtu.be/T_o39s7r0iE                                    
                                                                                                                                                           
  Terminal shows "RAM Coffers: ON | L2/L3 Resident: ON" – 26 days before their paper.                                                                      
                                                                                                                                                           
  Core shared concept: Query comes in → classify subject → route to relevant weight bank → hot cache keeps it fast                                         
                                                                                                                                                           
  What I added beyond the core:                                                                                                                            
  • NUMA topology – weights placed on specific memory nodes. Engram doesn't address hardware topology.                                                     
  • Neuromorphic mapping – brain regions to NUMA nodes                                                                                                     
  • Tetranary confidence – 4-state routing logic                                                                                                           
  • Vec_perm collapse – single-cycle attention on POWER8                                                                                                   
  • PowerLISP – LLMs that actually remember                                                                                                                
  • L2/L3 prefetch – 147 t/s vs 17 t/s stock (8.8x)                                                                                                        
                                                                                                                                                           
  DOIs:                                                                                                                                                    
  • RAM Coffers (Dec 16): doi.org/10.6084/m9.figshare.31093429                                                                                             
  • Neuromorphic: doi.org/10.5281/zenodo.18321905                                                                                                          
  • PowerLISP: doi.org/10.5281/zenodo.18322052                                                                                                             
                                                                                                                                                           
  GitHub: github.com/Scottcjn/ram-coffers

1 0
Read on Hacker News