Beginning January 2026, all ACM publications will be made open access
Classical statues were not painted horribly
The article examines the popular belief that ancient Greek and Roman statues were originally painted in bright colors, contrary to their current white appearance. It discusses the historical evidence and ongoing debates surrounding the polychrome nature of classical sculptures.
Your job is to deliver code you have proven to work
The article discusses a new technique called 'Code Proven to Work' that verifies the correctness of code through the use of machine-checked proofs. This approach aims to improve software quality and reliability by providing strong guarantees about the behavior of programs.
Virtualizing Nvidia HGX B200 GPUs with Open Source
This article discusses the virtualization of NVIDIA HGX-B200 GPUs using open-source software. It covers the technical details of the virtualization process, including the use of PCI passthrough and GPU sharing, and explores the benefits and challenges of this approach for high-performance computing applications.
Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction
Hi HN, we’re Sid and Ritvik, co-founders of Pulse. Pulse is a document extraction system to create LLM-ready text. We built Pulse as we realized that although modern vision language models are very good at producing plausible text, that makes them risky for OCR and data ingestion at scale.
When we started working on document extraction, we assumed the same thing many teams do today: foundation models were improving quickly, multi modal systems appeared to read documents well, and for small or clean inputs that assumption often held. The limitations showed up once we began processing real documents in volume. Long PDFs, dense tables, mixed layouts, low-fidelity scans, and financial or operational data exposed errors that were subtle, hard to detect, and expensive to correct. Outputs often looked reasonable while containing small but meaningful mistakes, especially in tables and numeric fields.
A lot of our work since then has been applied research. We run controlled evaluations on complex documents, fine tune vision models, and build labeled datasets where ground truth actually matters. There have been many nights where our team stayed up hand annotating pages, drawing bounding boxes around tables, labeling charts point by point, or debating whether a number was unreadable or simply poorly scanned. That process shaped our intuition far more than benchmarks alone.
One thing became clear quickly. The core challenge was not extraction itself, but confidence. Vision language models embed document images into high-dimensional representations optimized for semantic understanding rather than precise transcription. That process is inherently lossy. When uncertainty appears, models tend to resolve it using learned priors instead of surfacing ambiguity. This behavior can be helpful in consumer settings. In production pipelines, it creates verification problems that do not scale well.
Pulse grew out of trying to address this gap through system design rather than prompting alone. Instead of treating document understanding as a single generative step, the system separates layout analysis from language modeling. Documents are normalized into structured representations that preserve hierarchy and tables before schema mapping occurs. Extraction is constrained by schemas defined ahead of time, and extracted values are tied back to source locations so uncertainty can be inspected rather than guessed away. In practice, this results in a hybrid approach that combines traditional computer vision techniques, layout models, and vision language models, because no single approach handled these cases reliably on its own.
We are intentionally sharing a few documents that reflect the types of inputs that motivated this work. These are representative of cases where we saw generic OCR or VLM-based pipelines struggle.
Here is a financial 10K: https://platform.runpulse.com/dashboard/examples/example1
Here is a newspaper: https://platform.runpulse.com/dashboard/examples/example2
Here is a rent roll: https://platform.runpulse.com/dashboard/examples/example3
Pulse is not perfect, particularly on highly degraded scans or uncommon handwriting, and there is still room for improvement. The goal is not to eliminate errors entirely, but to make them visible, auditable, and easier to reason about.
Pulse is available via usage-based access to the API and platform You can try it here and access the API docs here.
Demo link here: https://video.runpulse.com/video/pulse-platform-walkthrough-...
We’re interested in hearing how others here evaluate correctness for document extraction, which failure modes you have seen in practice, and what signals you rely on to decide whether an output can be trusted. We will be around to answer questions and are happy to run additional documents if people want to share examples.
Are Apple gift cards safe to redeem?
The article discusses the potential security risks of redeeming Apple gift cards, highlighting concerns about fraudulent activity and the need for consumers to exercise caution when using such cards.
Jonathan Blow has spent the past decade designing 1,400 puzzles for you
The article discusses the work of game developer Jonathan Blow, who has spent the past decade designing over 1,400 puzzles for his games. It explores his meticulous approach to puzzle design and his quest to create unique and challenging experiences for players.
Military Standard on Software Control Levels
The article discusses the MIL-STD-882E, a military standard for system safety, and its application to software development. It highlights the key elements of the standard, including hazard analysis, risk assessment, and mitigation strategies, and how they can be integrated into the software development lifecycle.
Using TypeScript to Obtain One of the Rarest License Plates
The article discusses the use of license plate data by law enforcement agencies, highlighting concerns around privacy, accuracy, and potential abuse. It examines the expanding use of this technology and the need for greater transparency and oversight to protect civil liberties.
Dogalog: A realtime Prolog-based livecoding music environment
The Dogalog is a personal data management system that allows users to track and organize their digital assets, including files, contacts, and more. The project aims to provide a user-friendly and decentralized solution for managing personal information.
Creating apps like Signal could be 'hostile activity' claims UK watchdog
The article discusses a claim by the UK's Investigatory Powers Commissioner's Office that developing privacy-focused communication apps like Signal or WhatsApp could be considered 'hostile activity' under new anti-espionage laws. This raises concerns about the potential implications for developers and users of such applications.
Please Just Try Htmx
RCE via ND6 Router Advertisements in FreeBSD
The article discusses a security vulnerability in the FreeBSD rtsold(8) service, which could allow a remote attacker to execute arbitrary code. The vulnerability has been addressed in the latest version of FreeBSD, and users are advised to update their systems to mitigate the risk.
Slowness is a virtue
The article explores the concept of 'slowness as a virtue' and how embracing a slower pace can lead to more meaningful and fulfilling experiences in our fast-paced world. It encourages readers to be mindful, savor the moment, and find joy in the simple pleasures of everyday life.
Microscopic robots that sense, think, act, and compute
The article explores the use of soft robotics to develop a modular, adaptive prosthetic hand that can perform a wide range of dexterous manipulations. The researchers demonstrate the potential for this technology to improve the functionality and user experience of prosthetic limbs.
Hightouch (YC S19) Is Hiring
Hightouch, a data activation platform, is hiring across multiple departments including engineering, sales, and customer success. The company is looking for talented individuals to join its growing team and contribute to the development and success of its innovative data solutions.
Egyptian Hieroglyphs: Lesson 1
The article introduces Egyptian hieroglyphs, the writing system used in ancient Egypt, and provides an overview of the different types of hieroglyphs, including phonograms, ideograms, and determinatives, as well as the basic principles of how the writing system works.
I got hacked: My Hetzner server started mining Monero
The article describes the author's experience of discovering that their server had been compromised and was being used to mine Monero, a cryptocurrency, without their knowledge or consent. It provides insights into the author's investigation and actions taken to address the security breach.
What is an elliptic curve? (2019)
The article provides an overview of elliptic curves, which are mathematical structures with applications in cryptography and other areas. It explains the basic properties of elliptic curves and how they are used in the context of elliptic curve cryptography.
Show HN: A local-first memory store for LLM agents (SQLite)
OpenMemory is an open-source project that aims to develop a decentralized, privacy-focused memory storage and retrieval system. The project leverages blockchain technology and distributed storage to create a secure and censorship-resistant platform for storing and accessing personal data.
After ruining a treasured water resource, Iran is drying up
The article examines the water crisis in Iran, exploring how unsustainable water management practices, including the overbuilding of dams and the decline of traditional irrigation systems, have led to severe drought and depletion of groundwater resources, posing significant challenges for the country's environment and population.
systemd v259 Released
The article announces the release of systemd version 259, detailing the key improvements and changes, including updates to the network stack, service management, and security features.
Heart and Kidney Diseases and Type 2 Diabetes May Be One Ailment
The article suggests that heart disease, kidney disease, and type 2 diabetes may be interconnected and potentially treatable as a single condition. It discusses research indicating that these three conditions may stem from a common underlying cause, which could lead to new treatment approaches targeting the root of the problem.
From profiling to kernel patch: the journey to an eBPF performance fix
The article discusses the process of identifying and resolving a performance issue in a software system using eBPF (extended Berkeley Packet Filter) technology. It describes the steps taken, from profiling the system to developing and applying a kernel patch, to address the problem and improve the system's performance.
Most parked domains now serving malicious content
The article discusses the growing trend of parked domains serving malicious content, with researchers finding that over 80% of parked domains are now being used for malicious purposes such as hosting phishing sites, malware, and other harmful content.
The Big City; Save the Flophouses (1996)
The article discusses the importance of preserving flophouses, which provide affordable housing for low-income individuals in New York City, and the challenges they face due to gentrification and development pressures in the city.
AI helps ship faster but it produces 1.7× more bugs
The report explores the current state of AI-generated code compared to human-written code, analyzing factors like accuracy, code quality, and development speed. It provides insights into the strengths and limitations of both AI and human code generation, highlighting areas where AI is making significant progress.
It's all about momentum
This article explores the concept of momentum in personal growth and achievement, highlighting the importance of maintaining consistent effort and building positive habits over time to drive progress and success.
Working quickly is more important than it seems (2015)
The article discusses the importance of website speed and how it can impact user engagement, conversion rates, and search engine rankings. It provides strategies for improving website performance, such as optimizing images, minifying code, and using caching techniques.
Online Textbook for Braid groups and knots and tangles
The article explores the Red Oak tree, a majestic and valuable hardwood species that is native to North America. It delves into the tree's botanical characteristics, growth patterns, and its importance in forestry, woodworking, and the ecosystem.