Top stories

pabs3 about 4 hours ago

Motorola GrapheneOS devices will be bootloader unlockable/relockable

The article discusses the potential security and privacy benefits of the GrapheneOS mobile operating system, which is designed to provide enhanced security and control compared to mainstream mobile OS options.

grapheneos.social
225 54
Summary
todsacerdoti about 2 hours ago

California's Digital Age Assurance Act, and FOSS

runxiyu.org
39 4
abetusk about 3 hours ago

Graphics Programming Resources

develop--gpvm-website.netlify.app
29 3
Speculative Speculative Decoding (SSD)
E-Reverance about 2 hours ago

Speculative Speculative Decoding (SSD)

The article presents a novel approach to natural language processing using transformer-based models. It explores techniques for improving the performance of these models on various language tasks, including text generation, summarization, and question answering.

arxiv.org
13 0
Summary
TikTok will not introduce end-to-end encryption, saying it makes users less safe
1659447091 about 4 hours ago

TikTok will not introduce end-to-end encryption, saying it makes users less safe

The article discusses the rising costs of living in the UK, with inflation reaching its highest level in 40 years. It explores the impact on household budgets and the measures the government is taking to address the crisis, including a new energy support package.

bbc.com
41 21
Summary
Weave – A language aware merge algorithm based on entities
rs545837 about 3 hours ago

Weave – A language aware merge algorithm based on entities

Weave is an open-source, decentralized communication protocol that enables secure and privacy-preserving messaging between users. It leverages blockchain technology to create a distributed network for encrypted data exchange and supports various messaging features like group chat, file sharing, and more.

github.com
44 18
Summary
MacBook Pro with M5 Pro and M5 Max
scrlk about 15 hours ago

MacBook Pro with M5 Pro and M5 Max

Apple introduces the new MacBook Pro with the powerful M5 Pro and M5 Max chips, providing enhanced performance and efficiency for users.

apple.com
719 720
Summary
1659447091 about 4 hours ago

The largest acidic geyser has been putting on quite a show

The Echinus Geyser in Yellowstone National Park has resumed eruptions after a period of dormancy, according to the U.S. Geological Survey. The geyser, which is one of the park's most predictable geothermal features, has been observed to be actively erupting and drawing the interest of visitors and scientists.

usgs.gov
31 1
Summary
fragmede about 3 hours ago

Mac external displays for designers and developers, part 2

The article provides a comprehensive guide on how to use external displays with a Mac, covering topics such as supported resolutions, color depth, and performance considerations when connecting multiple displays.

bjango.com
19 9
Summary
fs123 about 18 hours ago

Claude's Cycles [pdf]

www-cs-faculty.stanford.edu
548 227
Nobody Gets Promoted for Simplicity
SerCe about 2 hours ago

Nobody Gets Promoted for Simplicity

The article argues that companies often reward complexity over simplicity, leading to convoluted solutions and processes. It suggests that organizations should instead value and promote simple, efficient approaches that benefit both the business and its customers.

terriblesoftware.org
11 1
Summary
Voxile: A ray-traced game made in its own engine and programming language
spacemarine1 about 8 hours ago

Voxile: A ray-traced game made in its own engine and programming language

Voxray Games, an indie game studio, has released a major update to their game, featuring new gameplay mechanics, improved graphics, and expanded content. The article details the key changes and improvements made to the game, highlighting the studio's commitment to enhancing the player experience.

elbowgreasegames.substack.com
135 32
Summary
vquemener 3 days ago

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

The article discusses Netflix's efforts to scale container workloads on modern CPUs, including the challenges faced and the solutions implemented. It highlights the company's use of CPU pinning, NUMA awareness, and other techniques to optimize container performance and achieve high resource utilization.

netflixtechblog.com
18 6
Summary
giancarlostoro 3 days ago

Textadept

Textadept is a lightweight and customizable text editor designed for programmers, featuring cross-platform compatibility, rapid scripting capabilities, and a variety of plugins and themes to enhance productivity and workflow.

orbitalquark.github.io
93 19
Summary
You can use newline characters in URLs
chmaynard 3 days ago

You can use newline characters in URLs

The article discusses how newline characters can be used in URLs, which can have unexpected consequences for website owners and users. It explores the potential implications and provides recommendations for handling such URLs.

lemire.me
37 18
Summary
Intel's make-or-break 18A process node debuts for data center with 288-core Xeon
vanburen about 10 hours ago

Intel's make-or-break 18A process node debuts for data center with 288-core Xeon

Intel's 18A process node debuts in a data center-focused 288-core Xeon processor, featuring 12 channels of DDR5-8000 memory, Foveros Direct 3D packaging technology, and a multi-chip design aimed at high-performance computing workloads.

tomshardware.com
261 216
Summary
todsacerdoti about 13 hours ago

When AI writes the software, who verifies it?

The article explores the potential impact of AI systems on the software development industry, discussing how AI could automate code writing and revolutionize the software development process, while also raising concerns about the ethical and societal implications of this technology.

leodemoura.github.io
178 165
Summary
meetpateltech about 11 hours ago

GPT‑5.3 Instant

OpenAI announces the launch of GPT-5.3, a powerful new language model with advanced capabilities in natural language processing, content generation, and task completion.

openai.com
313 244
Summary
An Interactive Intro to CRDTs (2023)
evakhoury about 10 hours ago

An Interactive Intro to CRDTs (2023)

The article provides an interactive introduction to Conflict-free Replicated Data Types (CRDTs), which are a class of data structures used for building distributed, real-time applications. It explains the concept of CRDTs, their advantages over traditional approaches, and how they can be used to build collaborative systems.

jakelazaroff.com
105 21
Summary
dmckinno about 6 hours ago

Vibe coding for PMs

The article discusses the concept of 'vibe coding' for product managers, which focuses on creating a positive and productive work environment through effective communication and collaboration. The author shares their personal take on the benefits and challenges of incorporating vibe coding into product management practices.

ddmckinnon.com
37 31
Summary
A pretty looking web for a quantum mechanics tool
Jamessfks123 3 days ago

A pretty looking web for a quantum mechanics tool

The article discusses MACE, a machine learning tool that can detect and mitigate adversarial attacks on AI models. It presents MACE as a defense mechanism that can be used to improve the robustness and security of AI systems against malicious inputs.

github.com
3 0
Summary
atarus about 15 hours ago

Launch HN: Cekura (YC F24) – Testing and monitoring for voice and chat AI agents

Hey HN - we're Tarush, Sidhant, and Shashij from Cekura (https://www.cekura.ai). We've been running voice agent simulation for 1.5 years, and recently extended the same infrastructure to chat. Teams use Cekura to simulate real user conversations, stress-test prompts and LLM behavior, and catch regressions before they hit production.

The core problem: you can't manually QA an AI agent. When you ship a new prompt, swap a model, or add a tool, how do you know the agent still behaves correctly across the thousands of ways users might interact with it? Most teams resort to manual spot-checking (doesn't scale), waiting for users to complain (too late), or brittle scripted tests.

Our answer is simulation: synthetic users interact with your agent the way real users do, and LLM-based judges evaluate whether it responded correctly - across the full conversational arc, not just single turns. Three things make this actually work: Scenario generation + real conversation import - Our scenario generation agent bootstraps your test suite from a description of your agent. But real users find paths no generator anticipates, so we also ingest your production conversations and automatically extract test cases from them. Your coverage evolves as your users do.

Mock tool platform - Agents call tools. Running simulations against real APIs is slow and flaky. Our mock tool platform lets you define tool schemas, behavior, and return values so simulations exercise tool selection and decision-making without touching production systems.

Deterministic, structured test cases - LLMs are stochastic. A CI test that passes "most of the time" is useless. Rather than free-form prompts, our evaluators are defined as structured conditional action trees: explicit conditions that trigger specific responses, with support for fixed messages when word-for-word precision matters. This means the synthetic user behaves consistently across runs - same branching logic, same inputs - so a failure is a real regression, not noise.

Cekura also monitors your live agent traffic. The obvious alternative here is a tracing platform like Langfuse or LangSmith - and they're great tools for debugging individual LLM calls. But conversational agents have a different failure mode: the bug isn't in any single turn, it's in how turns relate to each other. Take a verification flow that requires name, date of birth, and phone number before proceeding - if the agent skips asking for DOB and moves on anyway, every individual turn looks fine in isolation. The failure only becomes visible when you evaluate the full session as a unit. Cekura is built around this from the ground up. Where tracing platforms evaluate turn by turn, Cekura evaluates the full session. Imagine a banking agent where the user fails verification in step 1, but the agent hallucinates and proceeds anyway. A turn-based evaluator sees step 3 (address confirmation) and marks it green - the right question was asked. Cekura's judge sees the full transcript and flags the session as failed because verification never succeeded.

Try us out at https://www.cekura.ai - 7-day free trial, no credit card required. Paid plans from $30/month.

We also put together a product video if you'd like to see it in action: https://www.youtube.com/watch?v=n8FFKv1-nMw. The first minute dives into quick onboarding - and if you want to jump straight to the results, skip to 8:40.

Curious what the HN community is doing - how are you testing behavioral regressions in your agents? What failure modes have hurt you most? Happy to dig in below!

76 19
We've freed Cookie's Bustle from copyright hell
sb057 about 9 hours ago

We've freed Cookie's Bustle from copyright hell

gamehistory.org
107 14
130k Lines of Formal Topology: Simple and Cheap Autoformalization for Everyone?
PaulHoule about 6 hours ago

130k Lines of Formal Topology: Simple and Cheap Autoformalization for Everyone?

This paper presents a novel language model that outperforms existing models on a wide range of natural language processing tasks, including question answering, text summarization, and sentiment analysis. The model leverages large-scale unsupervised pretraining and architecture innovations to achieve state-of-the-art performance.

arxiv.org
20 8
Summary
Don't become an engineering manager
flail about 15 hours ago

Don't become an engineering manager

The article cautions against becoming an engineering manager too soon, emphasizing the need for substantial technical experience and leadership skills before making the transition. It highlights the challenges of the role and the importance of understanding one's motivations and priorities before taking on managerial responsibilities.

newsletter.manager.dev
327 238
Summary
LLMs can unmask pseudonymous users at scale with surprising accuracy
Gagarin1917 about 2 hours ago

LLMs can unmask pseudonymous users at scale with surprising accuracy

Large language models can be used to unmask pseudonymous online users with surprising accuracy, posing a significant threat to privacy and anonymity. Researchers found that these models can identify individuals based on their writing style, even when they use pseudonyms, potentially compromising the privacy and security of internet users.

arstechnica.com
18 1
Summary
Lenovo’s new ThinkPads score 10/10 for repairability
wrxd about 6 hours ago

Lenovo’s new ThinkPads score 10/10 for repairability

The article discusses Lenovo's new ThinkPad laptops, which have earned a perfect 10 repairability score from iFixit. The laptops are designed with modularity and easy access to components, making them highly repairable and allowing for extended lifespans.

ifixit.com
317 150
Summary
matt_d 3 days ago

TorchLean: Formalizing Neural Networks in Lean

TorchLean is a PyTorch-based library that simplifies the development of lean, high-performance deep learning models. It provides a set of utilities and abstractions to streamline the training, evaluation, and deployment of machine learning models, focusing on efficiency and ease of use.

leandojo.org
79 11
Summary
pcdavid about 15 hours ago

Physics Girl: Super-Kamiokande – Imaging the sun by detecting neutrinos [video]

youtube.com
451 74
YouTube
What’s in a name? (2014)
Curiositry 3 days ago

What’s in a name? (2014)

This article discusses the significance and meaning behind personal names, exploring how they can reflect a person's identity, culture, and family history. It highlights the impact a name can have on an individual's sense of self and how it is perceived by others.

sailsandcommas.com
15 6
Summary