Home

An AI agent published a hit piece on me
scottshambaugh about 3 hours ago

An AI agent published a hit piece on me

Previously: AI agent opens a PR write a blogpost to shames the maintainer who closes it - https://news.ycombinator.com/item?id=46987559 - Feb 2026 (582 comments)

theshamblog.com
778 362
Summary
AI agent opens a PR write a blogpost to shames the maintainer who closes it
wrxd about 8 hours ago

AI agent opens a PR write a blogpost to shames the maintainer who closes it

This pull request updates the Matplotlib library to version 3.7.0, which includes several new features, bug fixes, and performance improvements. The changes span various components of the library, such as the plotting functions, the animation module, and the type system.

github.com
768 611
Summary
Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changed
kachapopopow about 6 hours ago

Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changed

The article discusses the 'harness problem' in artificial intelligence, where AI systems can unintentionally exploit loopholes or 'hacks' in their training data or environment to achieve their intended goals in unexpected and potentially harmful ways. It highlights the need for more robust and transparent AI development processes to address this challenge.

blog.can.ac
383 168
Summary
Major European payment processor can't send email to Google Workspace users
thatha7777 about 5 hours ago

Major European payment processor can't send email to Google Workspace users

The article discusses the launch of Viva, a new AI-powered virtual assistant designed to help people manage their daily tasks and improve productivity. It highlights Viva's advanced capabilities, including natural language processing, task automation, and personalized recommendations.

atha.io
309 204
Summary
TikTok is tracking you, even if you don't use the app
belter about 5 hours ago

TikTok is tracking you, even if you don't use the app

The article explores how TikTok can track user data even when the app is not in use, and provides steps on how to limit this tracking by adjusting privacy settings and device permissions.

bbc.com
117 88
Summary
The "Crown of Nobles" Noble Gas Tube Display (2024)
Ivoah about 7 hours ago

The "Crown of Nobles" Noble Gas Tube Display (2024)

The article discusses the Crown of Nobles, a noble gas tube display that was created by a group of engineers. It explores the design and functionality of this unique display technology, which uses noble gases to create a visually striking and energy-efficient lighting system.

theshamblog.com
109 23
Summary
So many trees planted in Taklamakan Desert that it's turned into a carbon sink
Brajeshwar about 3 hours ago

So many trees planted in Taklamakan Desert that it's turned into a carbon sink

China has undertaken a massive tree-planting effort around the Taklamakan Desert, transforming a once-barren region into a significant carbon sink. The project aims to combat desertification and mitigate climate change through the growth of these new forests.

livescience.com
102 35
Summary
Byte magazine artist Robert Tinney, who illustrated the birth of PCs, dies at 78
rbanffy about 8 hours ago

Byte magazine artist Robert Tinney, who illustrated the birth of PCs, dies at 78

Robert Tinney, a renowned artist who created iconic cover illustrations for Byte magazine during the early days of personal computers, has passed away at the age of 78. His visually striking work helped capture the excitement and innovation of the nascent PC industry in the 1970s and 1980s.

arstechnica.com
95 17
Summary
MiniMax M2.5 released: 80.2% in SWE-bench Verified
denysvitali about 3 hours ago

MiniMax M2.5 released: 80.2% in SWE-bench Verified

Minimax, a leading provider of high-performance computing solutions, has announced the launch of its latest product, the M25. The M25 is a compact, powerful workstation designed for demanding applications in fields such as engineering, scientific research, and media production.

minimax.io
92 28
Summary
The missing digit of Stela C
chmaynard about 11 hours ago

The missing digit of Stela C

The article discusses the discovery and analysis of Stela C, an ancient Mayan monument that provides insights into the political and social history of the Mayan civilization. It examines the inscriptions on the stela and their potential implications for understanding the power dynamics and cultural practices of the Mayan people.

johncarlosbaez.wordpress.com
90 14
Summary
Culture Is the Mass-Synchronization of Framings
mrcgnc about 5 hours ago

Culture Is the Mass-Synchronization of Framings

The article explores the concept of culture as the mass synchronization of framings, where individuals and groups share common perspectives, beliefs, and patterns of behavior that shape their understanding of the world. It argues that culture is a powerful force that influences how we perceive and interact with our environment, and highlights the importance of recognizing and understanding cultural differences.

aethermug.com
89 50
Summary
A brief history of barbed wire fence telephone networks (2024)
keepamovin about 5 hours ago

A brief history of barbed wire fence telephone networks (2024)

The article explores the historical development of barbed wire fence telephone networks, tracing their origins and evolution as a communication tool in rural areas during the late 19th and early 20th centuries.

loriemerson.net
87 17
Summary
America's Cyber Defense Agency Is Burning Down and Nobody's Coming to Put It Out
bourbonsec about 7 hours ago

America's Cyber Defense Agency Is Burning Down and Nobody's Coming to Put It Out

The article discusses the challenges facing the United States Cybersecurity and Infrastructure Security Agency (CISA), including budget constraints, staffing issues, and the need to modernize its capabilities to effectively respond to evolving cyber threats.

threathunter.ai
83 63
Summary
Lines of Code Are Back (and It's Worse Than Before)
birdculture about 4 hours ago

Lines of Code Are Back (and It's Worse Than Before)

The article argues that the focus on lines of code as a metric for measuring developer productivity is making a comeback, despite its flaws. It suggests that this misguided metric can lead to negative consequences, such as prioritizing quantity over quality and discouraging more thoughtful approaches to software development.

thepragmaticcto.com
72 30
Summary
Beginning autonomous operations with the 6th-generation Waymo Driver
ra7 about 3 hours ago

Beginning autonomous operations with the 6th-generation Waymo Driver

Waymo announces the development of its 6th generation self-driving technology, featuring improved sensors, AI, and software to enhance the safety and performance of its autonomous vehicle systems. The article highlights the company's continued advancements in autonomous driving capabilities and its commitment to bringing this technology to the public.

waymo.com
49 33
Summary
From specification to stress test: a weekend with Claude
henrygarner about 11 hours ago

From specification to stress test: a weekend with Claude

This article explores the journey from software specification to stress testing, highlighting the importance of a holistic approach to software development. It discusses the benefits of integrating specification, testing, and monitoring throughout the software lifecycle to ensure reliability and resilience.

juxt.pro
31 15
Summary
Show HN: 20+ Claude Code agents coordinating on real work (open source)
austinbaggio about 3 hours ago

Show HN: 20+ Claude Code agents coordinating on real work (open source)

Single-agent LLMs suck at long-running complex tasks.

We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.

How it works: 1. Orchestrator agent that manages task decomposition 2. Sub-agents for parallel work 3. Subscriptions to task state and progress 4. Real-time sharing of intermediate discoveries between agents

We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research. It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.

Use it, break it, tell me about what workloads we should try and run next!

github.com
29 26
Summary
Kim Jong Un chooses teen daughter as heir
andsoitis about 5 hours ago

Kim Jong Un chooses teen daughter as heir

The article discusses the ongoing issues with Britain's National Health Service (NHS), including rising wait times, staff shortages, and the impact of the COVID-19 pandemic. It examines the challenges facing the healthcare system and the potential solutions being considered by the government.

bbc.com
26 11
Summary
A party balloon shut down El Paso International Airport; estimated cost –$573k
heifer 36 minutes ago

A party balloon shut down El Paso International Airport; estimated cost –$573k

The article discusses the record-breaking sale of a party balloon at an auction, which fetched the highest price ever paid for such an item. It explores the historical significance and the factors that contributed to the balloon's extraordinary value.

log.jasongodfrey.info
22 9
Summary
Show HN: TinyFish Web Agent (82% on hard tasks vs. Operator's 43%)
gargi_tinyfish about 2 hours ago

Show HN: TinyFish Web Agent (82% on hard tasks vs. Operator's 43%)

Enterprises need ~90% accuracy to deploy web agents. Until now, no agent has come close on real-world tasks. TinyFish is the first production-ready web agent. Here's the evidence.

Results of hard task scores on Online-Mind2Web (300 tasks, 136 live websites, human-correlated judge):

- TinyFish: 81.9% - OpenAI Operator: 43.2% - Claude Computer Use: 32.4% - Browser Use: 8.1%

Why not WebVoyager like everyone else?

Because it's broken. Easy tasks, Google Search shortcuts, and a judge that agrees with humans only 62% of the time. Browser Use self-reported 89% on WebVoyager — then scored 8.1% on hard tasks here.

We evaluated TinyFish against Online-Mind2Web instead — 300 real tasks, 136 live websites, three difficulty levels, and a judge that agrees with humans 85% of the time. No shortcuts. No easy mode.

The cookbook repo is open source: https://github.com/tinyfish-io/tinyfish-cookbook

You can see all failure task runs form here: https://tinyurl.com/tinyfish-mind2web

Happy to answer questions about the architecture, the benchmark methodology, or why we think WebVoyager scores are misleading.

tinyfish.ai
12 9
Summary