Top stories

Microsoft forced me to switch to Linux
bobsterlobster about 1 hour ago

Microsoft forced me to switch to Linux

Microsoft, known for its Windows operating system, is expanding its focus to include Linux. The article discusses Microsoft's increasing involvement with the open-source Linux platform, including its work on improving Linux compatibility and integrating Linux features into Windows.

himthe.dev
134 97
Summary
brk about 1 hour ago

Airfoil (2024)

This article explains the principles of airfoil design, focusing on the physical mechanisms that generate lift. It provides a detailed, technical analysis of the pressure distribution and flow characteristics around airfoils, helping to illustrate the fundamental concepts behind aircraft aerodynamics.

ciechanow.ski
102 14
Summary
yuppiepuppie about 5 hours ago

Show HN: The HN Arcade

I love seeing all the small games that people build and post to this site.

I don't want to forget any, so I have built a directory/arcade for the games here that I maintain.

Feel free to check it out, add your game if its missing and let me know what you think. Thanks!

andrewgy8.github.io
169 54
Summary
Package Management Is a Wicked Problem
zdw 4 days ago

Package Management Is a Wicked Problem

The article explores the inherent challenges of package management in software development, highlighting its complex and ever-evolving nature, which makes it a 'wicked problem' without a single, optimal solution. It discusses the trade-offs and competing priorities that developers and organizations face when managing dependencies and packages.

nesbitt.io
29 11
Summary
tonyww about 14 hours ago

A verification layer for browser agents: Amazon case study

A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-...

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

sentienceapi.com
7 2
Summary
Show HN: Dwm.tmux – a dwm-inspired window manager for tmux
saysjonathan 5 days ago

Show HN: Dwm.tmux – a dwm-inspired window manager for tmux

Hey, HN! With all recent agentic workflows being primarily terminal- and tmux-based, I wanted to share a little project I created about decade ago.

I've continued to use this as my primary terminal "window manager" and wanted to share in case others might find it useful.

I would love to hear about other's terminal-based workflows and any other tools you may use with similar functionality.

github.com
38 7
Summary
Show HN: Cua-Bench – a benchmark for AI agents in GUI environments
someguy101010 2 days ago

Show HN: Cua-Bench – a benchmark for AI agents in GUI environments

Hey HN, we're excited to share Cua-Bench ( https://github.com/trycua/cua ), an open-source framework for evaluating and training computer-use agents across different environments.

Computer-use agents show massive performance variance across different UIs—an agent with 90% success on Windows 11 might drop to 9% on Windows XP for the same task. The problem is OS themes, browser versions, and UI variations that existing benchmarks don't capture.

The existing benchmarks (OSWorld, Windows Agent Arena, AndroidWorld) were great but operated in silos—different harnesses, different formats, no standardized way to test the same agent across platforms. More importantly, they were evaluation-only. We needed environments that could generate training data and run RL loops, not just measure performance. Cua-Bench takes a different approach: it's a unified framework that standardizes environments across platforms and supports the full agent development lifecycle—benchmark, train, deploy.

With Cua-Bench, you can:

- Evaluate agents across multiple benchmarks with one CLI (native tasks + OSWorld + Windows Agent Arena adapters)

- Test the same agent on different OS variations (Windows 11/XP/Vista, macOS themes, Linux, Android via QEMU)

- Generate new tasks from natural language prompts

- Create simulated environments for RL training (shell apps like Spotify, Slack with programmatic rewards)

- Run oracle validations to verify environments before agent evaluation

- Monitor agent runs in real-time with traces and screenshots

All of this works on macOS, Linux, Windows, and Android, and is self-hostable.

To get started:

Install cua-bench:

% pip install cua-bench

Run a basic evaluation:

% cb run dataset datasets/cua-bench-basic --agent demo

Open the monitoring dashboard:

% cb run watch <run_id>

For parallelized evaluations across multiple workers:

% cb run dataset datasets/cua-bench-basic --agent your-agent --max-parallel 8

Want to test across different OS variations? Just specify the environment:

% cb run task slack_message --agent your-agent --env windows_xp

% cb run task slack_message --agent your-agent --env macos_sonoma

Generate new tasks from prompts:

% cb task generate "book a flight on kayak.com"

Validate environments with oracle implementations:

% cb run dataset datasets/cua-bench-basic --oracle

The simulated environments are particularly useful for RL training—they're HTML/JS apps that render across 10+ OS themes with programmatic reward verification. No need to spin up actual VMs for training loops.

We're seeing teams use Cua-Bench for:

- Training computer-use models on mobile and desktop environments

- Generating large-scale training datasets (working with labs on millions of screenshots across OS variations)

- RL fine-tuning with shell app simulators

- Systematic evaluation across OS themes and browser versions

- Building task registries (collaborating with Snorkel AI on task design and data curation, similar to their Terminal-Bench work)

Cua-Bench is 100% open-source under the MIT license. We're actively developing it as part of Cua (https://github.com/trycua/cua), our Computer Use Agent SDK, and we'd love your feedback, bug reports, or feature ideas.

GitHub: https://github.com/trycua/cua

Docs: https://cua.ai/docs/cuabench

Technical Report: https://cuabench.ai

We'll be here to answer any technical questions and look forward to your comments!

github.com
7 0
Summary
Rust at Scale: An Added Layer of Security for WhatsApp
ubj about 10 hours ago

Rust at Scale: An Added Layer of Security for WhatsApp

The article discusses how Facebook's WhatsApp team has adopted the Rust programming language to improve the security and performance of their messaging app at scale. It highlights the benefits of Rust, such as its memory safety guarantees and concurrency support, which have helped the team address key challenges in building a secure, scalable, and reliable messaging platform.

engineering.fb.com
154 42
Summary
coloneltcb 5 days ago

There's only one Woz, but we can all learn from him

Steve Wozniak, co-founder of Apple, is honored with the Tech Interactive Humanitarian Award for his lifelong dedication to technology and its potential to improve people's lives. The article highlights Wozniak's philanthropic efforts, his commitment to accessibility and education, and his continuing influence as a tech pioneer and visionary.

fastcompany.com
208 94
Summary
Show HN: Build Web Automations via Demonstration
ogandreakiro 1 day ago

Show HN: Build Web Automations via Demonstration

Hey HN,

We’ve been building browser agents for a while. In production, we kept converging on the same pattern: deterministic scripts for the happy path, agents only for edge cases. So we built Demonstrate Mode.

The idea is simple: You perform your workflow once in a remote browser. Notte records the interactions and generates deterministic automation code.

How it works: - Record clicks, inputs, navigations in a cloud browser - Compile them into deterministic code (no LLM at runtime) - Run and deploy on managed browser infrastructure

Closest analog is Playwright codegen but: - Infrastructure is handled (remote browsers, proxies, auth state) - Code runs in a deployable runtime with logs, retries, and optional agent fallback

Agents are great for prototyping and dynamic steps, but for production we usually want versioned code and predictable cost/behavior. Happy to dive into implementation details in the comments.

Demo: https://www.loom.com/share/f83cb83ecd5e48188dd9741724cde49a

-- Andrea & Lucas, Notte Founders

notte.cc
9 1
Summary
meetpateltech about 22 hours ago

Prism

OpenAI introduces Prism, a new AI system that can generate high-quality images from text descriptions. Prism aims to make image generation more accessible and flexible, allowing users to create custom images tailored to their needs.

openai.com
712 464
Summary
bigwheels 2 days ago

A few random notes from Claude coding quite a bit last few weeks

https://xcancel.com/karpathy/status/2015883857489522876

xcancel.com
776 639
Kyber (YC W23) Is Hiring a Staff Engineer
asontha about 4 hours ago

Kyber (YC W23) Is Hiring a Staff Engineer

Kyber, a cryptocurrency trading and payments platform, is seeking a Staff Engineer and Tech Lead to join their team. The role involves leading technical projects, mentoring junior engineers, and driving the development of Kyber's core products and infrastructure.

ycombinator.com
1 0
Summary
hcs about 7 hours ago

Virtual Boy on TV with Intelligent Systems Video Boy

The article discusses the development of Video Boy, a Vue.js-based video player that aims to provide a modern and customizable video viewing experience. It highlights the project's features, such as support for various video formats and the ability to create custom controls and overlays.

hcs64.com
54 4
Summary
gurjeet 6 days ago

SVG Path Editor

The article provides an interactive SVG path editor that allows users to create, edit, and export SVG paths with a simple and intuitive interface. It offers a range of tools and features to assist in the design and manipulation of vector graphics.

yqnn.github.io
167 21
Summary
430k-year-old well-preserved wooden tools are the oldest ever found
bookofjoe 1 day ago

430k-year-old well-preserved wooden tools are the oldest ever found

https://archive.ph/mHlUT

https://apnews.com/article/oldest-wooden-tools-marathousa-1-...

https://archaeologymag.com/2026/01/430000-year-old-wooden-to...

archaeologymag.com
458 237
Summary
peter_d_sherman 5 days ago

Golden Ratio using an equilateral triangle inscribed in a circle

This article demonstrates how to graphically derive the golden ratio using an equilateral triangle inscribed in a circle. It provides a step-by-step visual guide to constructing the golden ratio based on the properties of the equilateral triangle.

geometrycode.com
128 34
Summary
jonbaer 5 days ago

Pandas 3.0

The article discusses the upcoming release of Pandas 3.0, highlighting major changes and improvements, including new data structures, performance enhancements, and better support for handling missing data and datetime operations.

pandas.pydata.org
170 54
Summary
albertsikkema 5 days ago

Show HN: Extracting React apps from Figma Make's undocumented binary format

The article explores methods for reverse-engineering Figma design files, allowing users to extract and modify the underlying data, such as vector graphics, text elements, and layer structures, without directly accessing the Figma application.

albertsikkema.com
5 4
Summary
oldguy101 3 days ago

I Made a MIT Licensed Mecrisp-Stellaris Language Server

The article provides a detailed overview of the Mecrisp-Stellaris Forth system, including its features, hardware support, and integration with the Language Server Protocol (LSP) for improved development experiences.

mecrisp-stellaris-folkdoc.sourceforge.io
14 2
Summary
Thirty Years of the Square Kilometre Array
mooreds 3 days ago

Thirty Years of the Square Kilometre Array

The article discusses the Square Kilometre Array (SKA), the world's largest radio telescope project, which has been in development for over 30 years. It highlights the project's major achievements, including advancements in radio astronomy, technological innovations, and international collaboration in building this ambitious scientific endeavor.

physicsworld.com
43 12
Summary
justaboutanyone 4 days ago

Rust’s Standard Library on the GPU

This article explores the use of the Rust standard library on GPUs, highlighting the potential performance benefits and discussing the challenges involved in porting Rust code to run on graphics processing units.

vectorware.com
227 45
Summary
Doing the thing is doing the thing
prakhar897 1 day ago

Doing the thing is doing the thing

This article explores the concept of 'doing the thing' as a fundamental principle of software design, emphasizing the importance of focusing on the core functionality and user needs rather than getting bogged down in unnecessary complexity or over-engineering.

softwaredesign.ing
483 160
Summary
Lennart Poettering, Christian Brauner founded a new company
hornedhob about 21 hours ago

Lennart Poettering, Christian Brauner founded a new company

Amutable is a Canadian technology company that specializes in developing innovative software solutions. The website provides an overview of the company's mission, values, and expertise in areas such as cloud computing, data analytics, and user experience design.

amutable.com
336 517
Summary
ingve about 8 hours ago

Make.ts

The article discusses the author's experience with building a TypeScript-based build tool called 'make-ts' and the lessons they learned in the process. It highlights the importance of simplicity, focusing on core functionality, and learning from the community when developing new tools.

matklad.github.io
142 82
Summary
Parametric CAD in Rust
ecto about 19 hours ago

Parametric CAD in Rust

The article discusses the benefits of using virtual computer-aided design (VCAD) technology, which allows designers to create and manipulate 3D models in a virtual environment, improving collaboration and reducing the need for physical prototypes.

campedersen.com
194 148
Summary
Amazon closing its Fresh and Go stores
trenning 1 day ago

Amazon closing its Fresh and Go stores

https://www.cnn.com/2026/01/27/food/amazon-fresh-go-closures

https://www.wsj.com/business/retail/amazon-to-shut-down-all-...

finance.yahoo.com
273 492
Summary
pantalaimon 1 day ago

Xfwl4 – The Roadmap for a Xfce Wayland Compositor

The article discusses the challenges of implementing sustainable practices in the fashion industry, highlighting the need for collaboration between brands, consumers, and policymakers to address issues like waste, pollution, and worker exploitation.

alexxcons.github.io
341 270
Summary
AI2: Open Coding Agents
publicmatt about 23 hours ago

AI2: Open Coding Agents

The article introduces 'Open Coding Agents', a new AI system that can learn to perform diverse tasks by interpreting and following natural language instructions. The system demonstrates the ability to perform various coding-related tasks, showcasing the potential for AI to assist with software development and programming.

allenai.org
221 37
Summary
embedding-shape 1 day ago

Show HN: One Human + One Agent = One Browser From Scratch in 20K LOC

Related: https://simonwillison.net/2026/Jan/27/one-human-one-agent-on...

emsh.cat
290 131
Summary