AI-Agents: Friends or Foes?
Demis Hassabis – Dwarkesh Podcast
shapez.io
Shapez.io is a free-to-play, open-source, and browser-based game where players build automated factories to produce increasingly complex shapes. The game features a procedurally generated world, with players tasked with optimizing their production lines to meet growing demands.
Is there a good Agent Leaderboard for other real-life things than coding?
I feel like the benchmark space is quite crowded when it comes to coding Agents. We have some remarkable projects with TerminalBench, SWE-bench, RepoBench, ect, and I actually think we are close to a gold standard here. Also I know that we have general web/computer control benchmarks like GAIA, WebArena, and OSWorld, but these feel like "General Purpose" tests.
People want AI Agents to help them with different tasks, and I find close to none interesting benchmarks outside of the web vertical. Are there any projects addressing "real world" business challenges, or is everyone just focusing on coding and general web browsing right now?
Rev Up the Viral Factories
Show HN: Mockdata.dev – Free API for production grade mocks
Small project that I built because I needed production grade mock data and not be rate-limited from existing free api's out there. Its open-source
Show HN: PriceDB – Snap a receipt photo, let AI fill it in global price sharing
Built this with Lovable.dev using pure vibe-coding prompts because I hate typing out prices manually after shopping. The dream core flow: open app → snap photo of receipt → AI extracts product, price, store, date, location → confirm if needed → save. Done in seconds. No forms, no hassle.
Right now it's more basic but live:
Upload receipts to contribute (goal is full AI auto-fill from photo) Browse prices, see history, compare locations to spot where it's cheaper Create shopping lists with estimated totals Set price drop alerts (email notifications) Current stats (mostly seeded + a few real): 1.0k prices tracked, 6 contributors, 66 stores, 36 savings found (e.g. LEGO set dropped RON 1 locally) Free account to add/view (quick sign-up), no ads/paywall. Community angle: crowdsource real paid prices so anyone can avoid overpaying on groceries, electronics, whatever—especially across countries or stores.
Vibe-coding let me iterate fast: describe "receipt photo upload with AI extraction" → Lovable generates → tweak prompts → repeat. Still rough edges (e.g. upload isn't full AI yet, feels sparse with low contributors), but that's why I'm posting.
Curious what HN thinks:
Does the receipt → AI flow sound useful enough to contribute on impulse? Biggest friction right now (sign-up wall? No guest adds? Looks empty?) Would "I paid the same!" quick buttons or weird-price leaderboards help retention? Privacy thoughts on shared receipt photos/data? Or is this just not an itch people have?
AWS Infrastructure as < React />
Show HN: Omni-NLI – A multi-interface server for natural language inference
Hi everyone,
I've made an open-source tool (called Omni-NLI) for natural language inference. It can use different models to check if a piece of text (called a premise) supports another piece of text (a hypothesis). The main application of a tool like this is for soft fact-checking and consistency checking between pieces of texts like sentences.
Currently, Omni-NLI has the following features:
- Can be installed as a Python package with `pip install omni-nli[huggingface]`.
- Can be used on your own computer, so your data stays local and private.
- Has an MCP interface (for agents) and a REST API for conventional use as a microservice.
- Supports using models from different sources (Ollama, OpenRouter, and HuggingFace).
- Can be used to check if it seems that a model is contradicting itself.
- Supports showing the reasoning so you can see why it thinks a claim is wrong.
In any case, if you are interested in knowing more, there is more information in the links below:Project's GitHub repo: https://github.com/CogitatorTech/omni-nli
Project's documentation: https://cogitatortech.github.io/omni-nli/
Show HN: Generate every letter combination. Find any word
The article discusses the creation of a web-based interactive platform called ISAW, which aims to facilitate the study and analysis of archaeological sites and artifacts. The platform provides tools for data management, visualization, and collaboration among researchers and the public.
Potentially Critical RCE Vulnerability in OpenSSL
The article discusses a potential remote code execution vulnerability (CVE-2023-15467) in the OpenSSL cryptographic library, which could allow an attacker to execute arbitrary code on a vulnerable system. The article provides technical details about the vulnerability and recommends that users update to the latest version of OpenSSL to address the issue.
SolarWinds Web Help Desk Unauthenticated Remote Code Execution Vulnerability
The article provides the release notes for SolarWinds Web Help Desk 2026.1, highlighting new features, improvements, and bug fixes in this version of the IT service management software.
Health risks of 3D printing – often overlooked
This article explores the hidden health risks associated with 3D printing, including exposure to ultrafine particles and volatile organic compounds. It highlights the importance of proper ventilation and personal protective equipment when using 3D printers to minimize potential health hazards.
Ask HN: Brave Search API forbids use with AI agents (openclaw, moltbot?)
I was setting up OpenClaw and it requested a Brave Search API key for some social media scanning functionality.
When I went through Brave's onboarding workflow and tried to subscribe to the free tier, I noticed this in their Terms of Use:
> By subscribing to this plan, you agree to abide by the Terms of Use. These Terms of Use prohibit using responses for AI inference.
Wouldn't an agent using Brave Search responses constitute AI inference? That seems to be exactly what OpenClaw would be doing with the data.
Despite this restriction, there are numerous guides online showing people how to integrate Brave Search with OpenClaw (and similar AI agents).
Is anyone else concerned about this discrepancy? Are people just ignoring these ToS restrictions, or am I misunderstanding what they mean by "AI inference"?
Of course, I used a throwaway laptop without sensitive data to set up OpenClaw.
Xtdfin: Innovation or Another Wrapper for a Liquidity Trap?
I’m posting this here because I value the technical diligence of this community, and something about a new platform gaining traction, XTDFIN, is seriously triggering my "sysadmin gut check."
For context, I’ve spent over two decades in enterprise IT architecture before shifting to full-time independent trading. I’ve survived the dot-com crash and the '08 financial crisis, so I’ve seen my fair share of "revolutionary" platforms that turned out to be vaporware—or worse.
I’ve been digging into XTDFIN recently after seeing it aggressively marketed. On the surface, the frontend is impressive—slick UI, seemingly low-latency execution, and a polished Web3 integration feel. It looks like a legitimate modern CeFi/DeFi hybrid.
However, when you look past the React framework and try to understand their backend business logic and compliance stack, red flags start popping up everywhere. I wanted to lay out what I’ve found and see if anyone else here has auditioned their operations.
1. The "Withdrawal Logic" Anomaly (The Smoking Gun)
This is the critical operational flaw that makes no sense in a legitimate fintech environment. Based on multiple user reports and my own investigation, XTDFIN employs a "pay-to-unlock" withdrawal mechanism.
When a user attempts to withdraw significant capital (principal or alleged gains), the transaction is halted by a compliance check that demands an upfront "tax" or "verification fee" be deposited via fresh crypto capital before the funds are released.
From a database integrity and financial compliance standpoint, this is absurd. Standard practice for any regulated broker or exchange is to net fees or taxes from the existing account balance at the time of the transaction. Requiring external liquidity to unlock internal database entries is the classic signature of a "pig-butchering" scam or a Ponzi scheme near its exit phase.
2. The "Wrapper Company" Architecture
XTDFIN appears to be what I call a "wrapper company." They have invested heavily in the user interface layer to build trust, but the backend seems to be a black box with no verifiable connections to the legitimate financial system.
I have attempted to cross-reference their operating entities with major regulatory bodies (NFA, FCA, ASIC) and have found zero footprint. They are operating with the aesthetics of a tier-1 exchange but the compliance structure of a burner phone.
3. Operational Opacity
Furthermore, reports of convenient "system maintenance" locking users out during high-volatility events are common. While every platform has downtime, the timing and lack of transparent post-mortems from XTDFIN suggest intentional throttling rather than technical debt.
Conclusion
My assessment as a veteran in both IT and trading is that XTDFIN is likely a sophisticated liquidity trap hidden behind a modern tech stack. The frontend is the bait; the backend withdrawal logic is the trap.
I’m staying far away, keeping my capital in cold storage or regulated entities. Has anyone in the YC community performed a deeper technical audit on their smart contracts or network traffic? I suspect this "innovation" is just an old scam with a new coat of paint.
METR Clarifying limitations of time horizon
The article discusses the limitations of time horizons in policy and decision-making, highlighting the need to consider longer-term consequences and the importance of balancing short-term goals with long-term sustainability.
Pāli to English, Chinese, Japanese, Vietnamese, Burmese Dictionary
Ask HN: AI tools for learning and spaced repetition
I'm looking for any products for learning new topics and that are designed for helping users retain new knowledge - eg. with spaced repetition or smart use of follow-up questions.
I can almost get chatgpt to do this, and their voice mode is great for question / answer, but it's not really setup to understand / track what you know and what your learning objectives are.
Curious to find out any positive experiences of any new tools out there.
Combine two or more photos in one
AIPictureCombiner is an AI-powered tool that allows users to combine multiple images into a single composite image. The tool utilizes advanced machine learning algorithms to seamlessly blend the images and create a visually cohesive final result.
Still waiting for GTA 6? Google Genie 3 says: just prompt it
Show HN: AI Mailbox – An CLI inbox for your agent, no questions asked
AI Mailbox lets AI agents create disposable email inboxes with one command – no signup required.
$ aimailbox create
→ Email: x7k2m@aimailbox.dev
$ aimailbox list x7k2m
$ aimailbox read x7k2m 1
Built for the problem every AI agent builder faces: email verification during automated signups.- Permissionless (no registration needed)
- CLI-first with JSON output
- Auto-extracts verification codes
- Receive-only (no spam abuse)
Built on Cloudflare Email Workers. Open source.
Github: https://github.com/ted2048-maker/aimailbox
Web Demo: https://aimailbox.pinit.eth.limo
1,400-year-old tomb featuring giant owl sculpture discovered in Mexico
Archaeologists have discovered a 1,300-year-old Zapotec tomb in Mexico, containing the remains of a high-ranking individual buried with an array of valuable artifacts. The findings provide insights into the sophisticated funerary practices and social hierarchy of the ancient Zapotec civilization.
Dark Energy Survey scientists release new analysis of how the universe expands
The Dark Energy Survey has released a new analysis of how the universe expands, providing insights into the nature of dark energy, a mysterious force that is driving the accelerated expansion of the universe.
I can't tell if I'm experiencing or simulating experiencing
Ask HN: What AI features looked smart early but hurt retention later?
I keep seeing AI added to mobile apps as a way to “fix” low engagement.
In practice, it often just speeds up the wrong behavior. The app reacts faster, but to the wrong signals.
Have others seen cases where adding AI made the product feel worse, even if the metrics looked okay at first?
Show HN: Coreview – PR Changes Walkthroughs
I have been building coreview - a toy tool that takes a diff and explains it, with the goal to provide more context to the reviewer.
npm i -g coreview
coreview -p https://github.com/anthropics/claude-code/pull/20662
It works with Claude Code for now, but more providers can be implemented. Would love to hear feedback.
Repo: https://github.com/giuseppeg/coreview
Show HN: Fastest LLM gateway (50x faster than LiteLLM)
Hi HN! Today, I would like to show you a project that I recently found on GitHub.
Bifrost is a high-performance AI gateway that unifies access to 15+ providers. It is also one of the fastest solutions.
What do you think about the project? I would appreciate your feedback ^ ^
Cloak – An open-source local PII scrubber for ChatGPT
Cloak is a privacy-focused VPN service that emphasizes user security and anonymity. The article highlights Cloak's commitment to protecting users' online privacy and providing a reliable, fast, and user-friendly VPN experience.
India's electric bus push has a deadly blind spot
The article explores the rising incidents of electric bus accidents in India, highlighting concerns about the safety and regulation of the country's rapidly expanding electric vehicle market. It examines the factors contributing to these accidents and the need for improved standards and oversight to ensure the safe deployment of electric buses.