Story

Show HN: 500-cycle runtime test for long-horizon LLM coherence

teugent Sunday, January 25, 2026

We ran a 500-cycle benchmark to test long-horizon reasoning stability in large language models — not just output quality, but whether a model can maintain coherent identity and logic across hundreds of recursive reasoning steps.

This is part of our SIGMA Runtime project — a cognitive control layer that runs on top of any LLM and tracks drift, coherence, and identity persistence in real time.

---

Why we did this

Most LLM evals measure short reasoning spans — 1-10 turns. But when a model is asked to sustain a line of reasoning over hundreds of steps, subtle feedback effects appear:

- Semantic drift: meaning slowly shifts as text compounds. - Crystallization: the model locks into repeating its own phrasing or style. - Identity loss: the “speaker” loses internal consistency.

We wanted to see whether it’s possible to prevent these effects at runtime, without retraining or prompt resets.

---

What’s new here

We replaced the older ACE anti-crystallization layer with a new system called AEP (Adaptive Entropy Protocol) — a real-time regulator that injects controlled entropy into model outputs.

AEP tracks three internal metrics: - TI — Terminological Isometry (consistency of key concepts) - SDC — Semantic Drift Coefficient (meaning variation rate) - L/N — Logic-to-Noise ratio (logical density vs surface variation)

When the model becomes too stable (repetition, rigid phrasing), AEP adds micro-perturbations to restore variation. When it drifts too far, it dampens entropy back into equilibrium.

---

How we tested it

- 500 reasoning cycles per model (OpenAI GPT-5.2 & Gemini-3-Flash Preview) - Every 50th cycle = a Rib Point that compresses and verifies the last 49 steps - Continuous telemetry from the runtime (coherence, drift, entropy) - Identity: same synthetic agent (“LEO”, AI architect/cognitive scientist)

---

What happened

Both models completed all 500 cycles without identity loss or semantic collapse. Entropy modulation increased lexical variety, while keeping reasoning trajectories coherent.

When truncations occurred (Gemini API), the runtime reconstructed missing context using prior compression checkpoints.

---

Visual results

Drift & coherence evolution (500 cycles) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_D_summary_dashboar... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_D_summary_dashboard...

AEP metric dynamics (TI, SDC, L/N) GPT-5.2: https://files.sigmastratum.net/Leo_OpenAI_E_metrics_timeline... Gemini-3-Flash: https://files.sigmastratum.net/Leo_Gogle_E_metrics_timeline....

---

Takeaway

- Entropy can be regulated, not just randomized. - LLMs can maintain self-consistent reasoning over hundreds of cycles when given runtime feedback. - Structural stability (coherence, terminology, logic) doesn’t require retraining — only a dynamic control layer.

---

Report (DOI): https://doi.org/10.5281/zenodo.18271591 Code & appendix: https://github.com/sigmastratum/documentation

---

We’d love technical feedback on: - Runtime-level coherence control - Measuring “identity persistence” - Long-horizon reasoning tests (100+ turns)

Summary
The article discusses a new machine learning approach for predicting speech intelligibility in audio samples. The proposed method uses a deep neural network model to analyze various acoustic features and provides accurate intelligibility estimates, which could be useful for speech processing applications.
1 0
Summary
zenodo.org
Visit article Read on Hacker News