Home

LLMs work best when the user defines their acceptance criteria first
dnw about 7 hours ago

LLMs work best when the user defines their acceptance criteria first

The article discusses the limitations of large language models (LLMs) in writing correct code, highlighting their tendency to produce code with bugs or syntax errors. It emphasizes the importance of understanding the underlying principles and limitations of these models when using them for programming tasks.

blog.katanaquant.com
185 148
Summary
UUID package coming to Go standard library
soypat about 7 hours ago

UUID package coming to Go standard library

The article discusses a potential issue with the Go programming language, where the new hash function in Go 1.19 may cause compatibility issues with existing code that relies on the old hash function. The discussion explores potential solutions and the impact on the Go ecosystem.

github.com
152 81
Summary
The worst acquisition in history, again
JumpCrisscross about 11 hours ago

The worst acquisition in history, again

The article discusses the acquisition of AOL by Time Warner, considered one of the worst business deals in history. It explores the factors that led to the failure of this merger, including the clash of cultures, the overvaluation of AOL, and the inability to adapt to the changing digital landscape.

profgmedia.com
82 59
Summary
Nintendo Sues U.S. Government for Tariff Refunds
coloneltcb about 11 hours ago

Nintendo Sues U.S. Government for Tariff Refunds

Nintendo has sued the U.S. government, seeking refunds for tariffs it paid on products imported from China. The lawsuit claims the tariffs were unlawful and seeks to recover the millions of dollars Nintendo paid in tariffs over several years.

scribd.com
50 8
Summary
AI Error May Have Contributed to Girl's School Bombing in Iran
apolloartemis about 3 hours ago

AI Error May Have Contributed to Girl's School Bombing in Iran

An exclusive report on a devastating AI error that led to a bombing at a girls' school in an undisclosed location, resulting in significant casualties and sparking widespread concern over the safety and reliability of AI systems.

thisweekinworcester.com
38 14
Summary
X Users Find Their Real Names Are Being Googled in Israel
upofadown about 11 hours ago

X Users Find Their Real Names Are Being Googled in Israel

The article reports that users of the X-verification software by Au10tix in Israel have found their real names being searched on Google, raising privacy concerns about the possible misuse of their personal data by the company or the government.

mintpressnews.com
18 4
Summary
Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open
delduca about 10 hours ago

Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open

Homebrew engine https://github.com/willtobyte/carimbo

github.com
13 5
Summary
Trump has privately shown serious interest in U.S. ground troops in Iran
johnbarron about 3 hours ago

Trump has privately shown serious interest in U.S. ground troops in Iran

The article reports that former President Trump privately expressed serious interest in using U.S. ground troops against Iran, according to current and former U.S. officials. This revelation comes amid tensions between the U.S. and Iran in the final weeks of Trump's presidency.

nbcnews.com
11 3
Summary
Wild crows in Sweden help clean up cigarette butts
jhncls about 10 hours ago

Wild crows in Sweden help clean up cigarette butts

A study in Sweden has found that wild crows are capable of collecting and disposing of cigarette butts, demonstrating their potential to aid in environmental cleanup efforts. The crows were trained to deposit cigarette butts into a dispenser in exchange for a food reward, highlighting their ability to be used as natural cleanup crews.

samodobrevijesti.com
10 4
Summary
Grammarly is using our identities without permission
LordAtlas about 4 hours ago

Grammarly is using our identities without permission

The article discusses Grammarly's use of AI technology to improve writing and grammar checking, including insights from an AI expert on the company's approach and the potential benefits and limitations of its AI-powered tools.

theverge.com
7 1
Summary
Ships in Gulf declare themselves Chinese to dodge attack
KnuthIsGod about 3 hours ago

Ships in Gulf declare themselves Chinese to dodge attack

The article discusses the impact of the COVID-19 pandemic on the global economy, highlighting its uneven effects across different sectors and regions, as well as the challenges and uncertainties faced by policymakers in navigating the recovery.

ft.com
7 0
Summary
Show HN: Graph-Oriented Generation – Beating RAG for Codebases by 89%
dchisholm125 about 12 hours ago

Show HN: Graph-Oriented Generation – Beating RAG for Codebases by 89%

LLMs are better at being the "mouth" than the "brain" and I can prove it mathematically. I built a deterministic graph engine that offloads reasoning from the LLM. It reduces token usage by 89% and makes a tiny 0.8B model trace enterprise execution paths flawlessly. Here is the white paper and the reproducible benchmark.

github.com
7 0
Summary
Show HN: I built a tool to manage work and personal Git repos
tomquirk about 2 hours ago

Show HN: I built a tool to manage work and personal Git repos

I finally got fed up with making work + personal Git repos work on my machine.

Ever clone a work repository and accidentally make commits using your personal email? Or forget how to make your work repos use your work SSH key? Yeah... same.

So I built the tool I've always wanted. GitPersona - a CLI tool for managing many git profiles on a single machine.

Shoutout to my boy Codex for helping me finally ship this.

github.com
6 1
Summary
Show HN: Context-compact – Summarize agent context instead of truncating it
EmptyDrum about 6 hours ago

Show HN: Context-compact – Summarize agent context instead of truncating it

agents. Not a custom truncation strategy, not a sliding window, not dropping old messages and hoping for the best. The failure mode is well-understood: your context window fills up, you truncate from the top, and the agent loses the thread. It forgets the task it was working on, the file path it just wrote to, the UUID it needs to reference. The conversation breaks. The problem is everyone keeps solving it by throwing away information instead. Truncation is fast to implement and quietly wrong. The agent appears to work until it doesn't, and debugging context loss in a long-running session is painful. context-compact summarizes old messages via your LLM of choice and replaces them with a compact summary. Fires automatically at 85% context utilization. Preserves UUIDs, file paths, and URLs verbatim so identifiers survive compaction. Handles histories longer than the summarization model's own context window by chunking sequentially with a running summary carried forward. Works with Anthropic, OpenAI, or any SDK. Zero dependencies.

github.com
6 2
Summary
Show HN: Agent Office – Slack for (OpenClaw Like) AI Agents
arbayi about 9 hours ago

Show HN: Agent Office – Slack for (OpenClaw Like) AI Agents

The article discusses the development of an 'Agent Office' system, which is a web application that allows users to manage real estate agents and their activities. The system includes features such as agent scheduling, task management, and performance tracking.

github.com
5 1
Summary
Show HN: Sheila, an AI agent that replaced our accounting flow
knewter about 9 hours ago

Show HN: Sheila, an AI agent that replaced our accounting flow

Soapbox, a social media platform, has announced the launch of Sheila, a new feature that allows users to create and share audio messages with their followers. Sheila aims to provide a more personal and engaging way for users to connect and communicate on the platform.

soapbox.pub
5 2
Summary
Show HN: Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA
mannybruv 27 minutes ago

Show HN: Llama 3.2 3B and Keiro Research achieves 85% on SimpleQA

ran this over the weekend. stack was Llama 3.2 3B running locally + Keiro Research API for retrieval.

85.0% on 4,326 questions. where that lands:

ROMA (357B): 93.9% OpenDeepSearch (671B): 88.3% Sonar Pro: 85.8% Llama 3.2 3B + Keiro: 85.0%

the systems ahead of us are running models 100-200x larger. that's why they're ahead. not better retrieval, not better prompting — just way more parameters.

the interesting part is how small the gap is despite that. 3 points behind a 671B model. 0.8 behind Sonar Pro. at some point you have to ask what you're actually buying with all that compute for this class of task.

Want to know how low the reader model can go before it starts mattering. in this setup it clearly wasn't the limiting factor and also if smaller models with web enabled will perform as good( if not better) as larger models for a lot of non coding tasks

Full benchmark script + results --> https://github.com/h-a-r-s-h-s-r-a-h/benchmark

Keiro research -- https://www.keirolabs.cloud/docs/api-reference/research

keirolabs.cloud
5 4
Summary
Show HN: key-carousel - Key rotation for LLM agents
EmptyDrum about 10 hours ago

Show HN: key-carousel - Key rotation for LLM agents

I think in-process key management is the right abstraction for multi-key LLM setups. Not LiteLLM, not a Redis queue, not a custom load balancer.

The failure modes are well-understood: a key gets rate-limited, you wait, you try the next one. Billing errors need a longer cooldown than rate limits. This is not a distributed systems problem — it's a state machine that fits in a library. The problem is everyone keeps solving it with infrastructure instead. Spin up LiteLLM, now you have a Python service to maintain. Reach for Redis, now you have a database for a problem that doesn't need one. key-carousel manages a pool of API key profiles with exponential-backoff cooldowns: 1min → 5min → 25min → 1hr for rate limits, 5hr → 24hr for billing. Falls back to OpenAI or Gemini when Anthropic keys are exhausted. Optional file persistence. Zero dependencies.

github.com
5 1
Summary
AI and the Illegal War
interpol_p about 5 hours ago

AI and the Illegal War

This article explores the potential role of AI in illegal wars, highlighting concerns about its use in surveillance, targeting, and propaganda. It raises ethical questions about the implications of AI-powered weapons and the need for international regulation to prevent misuse.

buttondown.com
5 0
Summary
Armed robots take to the battlefield in Ukraine war
dabinat about 4 hours ago

Armed robots take to the battlefield in Ukraine war

The article discusses the potential impact of the COVID-19 pandemic on the global economy, including concerns about rising inflation, supply chain disruptions, and the possibility of a recession. It explores the challenges faced by governments and central banks in navigating the economic uncertainty.

bbc.com
4 0
Summary