New stories

tonyww 3 minutes ago

A verification layer for browser agents: Amazon case study

A common approach to automating Amazon shopping or similar complex websites is to reach for large cloud models (often vision-capable). I wanted to test a contradiction: can a ~3B parameter local LLM model complete the flow using only structural page data (DOM) plus deterministic assertions?

This post summarizes four runs of the same task (search → first product → add to cart → checkout on Amazon). The key comparison is Demo 0 (cloud baseline) vs Demo 3 (local autonomy); Demos 1–2 are intermediate controls.

More technical detail (architecture, code excerpts, additional log snippets):

https://www.sentienceapi.com/blog/verification-layer-amazon-...

Demo 0 vs Demo 3:

Demo 0 (cloud, GLM‑4.6 + structured snapshots) success: 1/1 run tokens: 19,956 (~43% reduction vs ~35k estimate) time: ~60,000ms cost: cloud API (varies) vision: not required

Demo 3 (local, DeepSeek R1 planner + Qwen ~3B executor) success: 7/7 steps (re-run) tokens: 11,114 time: 405,740ms cost: $0.00 incremental (local inference) vision: not required

Latency note: the local stack is slower end-to-end here largely because inference runs on local hardware (Mac Studio with M4); the cloud baseline benefits from hosted inference, but has per-token API cost.

Architecture

This worked because we changed the control plane and added a verification loop.

1) Constrain what the model sees (DOM pruning). We don’t feed the entire DOM or screenshots. We collect raw elements, then run a WASM pass to produce a compact “semantic snapshot” (roles/text/geometry) and prune the rest (often on the order of ~95% of nodes).

2) Split reasoning from acting (planner vs executor).

Planner (reasoning): DeepSeek R1 (local) generates step intent + what must be true afterward. Executor (action): Qwen ~3B (local) selects concrete DOM actions like CLICK(id) / TYPE(text). 3) Gate every step with Jest‑style verification. After each action, we assert state changes (URL changed, element exists/doesn’t exist, modal/drawer appeared). If a required assertion fails, the step fails with artifacts and bounded retries.

Minimal shape:

ok = await runtime.check( exists("role=textbox"), label="search_box_visible", required=True, ).eventually(timeout_s=10.0, poll_s=0.25, max_snapshot_attempts=3)

What changed between “agents that look smart” and agents that work Two examples from the logs:

Deterministic override to enforce “first result” intent: “Executor decision … [override] first_product_link -> CLICK(1022)”

Drawer handling that verifies and forces the correct branch: “result: PASS | add_to_cart_verified_after_drawer”

The important point is that these are not post‑hoc analytics. They are inline gates: the system either proves it made progress or it stops and recovers.

Takeaway If you’re trying to make browser agents reliable, the highest‑leverage move isn’t a bigger model. It’s constraining the state space and making success/failure explicit with per-step assertions.

Reliability in agents comes from verification (assertions on structured snapshots), not just scaling model size.

sentienceapi.com
1 0
Summary
Katherine603 3 minutes ago

Show HN: A free online video compression tool for instant compression

I developed a simple AI-based tool for compressing videos. You upload a video, which can reduce the file size while maintaining quality and optimizing your video. It supports formats such as MP4, MOV, AVI, etc., compressing file size without losing picture quality. All programs run in the browser. I mainly want to hear feedback on output quality, user experience, and any technical improvements worth exploring.

videocompress.ai
1 0
Summary
Where China built on coal, India is building on sun
mooreds 4 minutes ago

Where China built on coal, India is building on sun

The article explores India's rapid expansion of its renewable energy sector, particularly in solar power, as it seeks to transition away from its reliance on coal-based electricity generation. It highlights India's ambitious targets for renewable energy installation and the key drivers behind this shift, including the country's commitment to sustainable development and the declining costs of solar technology.

ember-energy.org
1 0
Summary
What 8 Weeks of Cold Exposure Taught Me
aftermath101 4 minutes ago

What 8 Weeks of Cold Exposure Taught Me

The article discusses the author's experience with 8 weeks of cold exposure, including its impact on their mental well-being, physical health, and personal growth. It highlights the potential benefits of incorporating cold exposure into one's lifestyle, such as increased resilience, improved mood, and enhanced cognitive function.

samswellnessbrief.substack.com
1 0
Summary
Artemis 2 launch: Livestream info, launch window details
ourmandave 7 minutes ago

Artemis 2 launch: Livestream info, launch window details

NASA's Artemis 2 mission, the first crewed flight of the Artemis program, is set to launch in 2024. The mission will send a crew of four astronauts on a lunar flyby, a crucial step towards returning humans to the Moon for the first time since the Apollo era.

mashable.com
1 0
Summary
CGMthrowaway 8 minutes ago

FASB 56 and the Authority of the DNI to Waive SEC Financial Reporting

The article discusses the impact of FASAB Standard 56, which grants the Director of National Intelligence the authority to waive financial reporting requirements for certain government agencies. This has raised concerns about transparency and accountability in federal financial management.

missingmoney.solari.com
1 0
Summary
The Silicon Valley Canon: On the Paideía of the American Tech Elite (2024)
ipnon 8 minutes ago

The Silicon Valley Canon: On the Paideía of the American Tech Elite (2024)

The article examines the intellectual and cultural influences that shape the education and worldview of the American tech elite, highlighting the role of the 'Silicon Valley Canon' in cultivating a particular form of 'paideia' or liberal education among this influential group.

scholars-stage.org
1 0
Summary
Spree: Open-source eCommerce platform Built using Ruby on Rails
thunderbong 9 minutes ago

Spree: Open-source eCommerce platform Built using Ruby on Rails

Spree is an open-source e-commerce platform built with Ruby on Rails. It provides a flexible and extensible framework for building customized online stores, with features such as product management, shopping cart, checkout, and order processing.

github.com
1 0
Summary
s3arch 11 minutes ago

Is coding dead because AI has taken over it?

The article discusses the future of coding and whether it will become obsolete. It explores the impact of advancements in artificial intelligence, no-code/low-code tools, and the evolving skills required for software development.

jehuamanna.com
1 1
Summary
djshah 16 minutes ago

Kimi AI K2.5 Model Introduction [video]

youtube.com
1 0
YouTube
norrsson 16 minutes ago

Amazon inadvertently announces cloud unit layoffs in email to employees

Amazon accidentally sent an email to employees confirming upcoming layoffs on Wednesday, marking the latest in a series of major tech companies implementing job cuts amid economic uncertainty.

cnbc.com
5 0
Summary
exvi 17 minutes ago

Internet Movie Cars Database

imcdb.org
1 0
Firefox Split View is ready for testing
rossdavidh 19 minutes ago

Firefox Split View is ready for testing

The article discusses the introduction of Split View, a new feature in Firefox that allows users to view two web pages side-by-side. It also provides an overview of other updates and improvements made to Firefox in recent weeks.

blog.nightly.mozilla.org
1 1
Summary
How to Win Titular Metagames
Curiositry 20 minutes ago

How to Win Titular Metagames

The article provides practical tips for crafting effective article titles, including using relevant keywords, making the title concise and engaging, and considering the reader's perspective. It emphasizes the importance of titles in attracting readers and improving search engine optimization.

taylor.town
2 0
Summary
tartoran 21 minutes ago

The Rise of Chinese Memory [video]

youtube.com
1 0
YouTube
salkahfi 21 minutes ago

The Thing About Moltbot (Clawdbot) That Nobody Wants to Admit

twitter.com
1 0
rebasedoctopus 27 minutes ago

Super Monkey Ball ported to a website

MonkeyBall is an online platform that allows users to create, customize, and play with virtual monkey avatars. The website offers a range of features, including a character creation tool, a social networking component, and various minigames and activities for users to engage with.

monkeyball-online.pages.dev
2 0
Summary
PaulHoule 28 minutes ago

Human-centred cybersecurity: the case of the Florida water plant hack

The article explores the importance of human-centered cybersecurity approaches for critical infrastructure, emphasizing the need to consider the human factors and user experiences in designing secure systems that are both effective and usable.

emerald.com
1 0
Summary
Matrix on Cloudflare Workers; what could go wrong?
Arathorn 31 minutes ago

Matrix on Cloudflare Workers; what could go wrong?

The article discusses how Matrix, an open network for secure, decentralized communication, is leveraging Cloudflare Workers to improve the scalability and resilience of its homeserver infrastructure. This allows Matrix to provide a more reliable and accessible platform for real-time communication and collaboration.

matrix.org
1 0
Summary
AI found 12 vulnerabilities in OpenSSL
mmsc 33 minutes ago

AI found 12 vulnerabilities in OpenSSL

The article reports that Aisle, a cybersecurity research company, discovered 12 out of 12 OpenSSL vulnerabilities that were publicly disclosed in 2022. This highlights the importance of proactive security research and the need for prompt patching of vulnerabilities in widely used cryptographic libraries.

aisle.com
3 0
Summary
whack 33 minutes ago

Maine's Immigrant Students Stay Home as ICE Operation Ramps Up

nytimes.com
8 3
samclemens 38 minutes ago

Why are we still so afraid of using the grumpy old period?

https://archive.md/g1oq3

nytimes.com
1 0
Washington State Bill Seeks to Add Firearms Detection to 3D Printers
bilsbie 42 minutes ago

Washington State Bill Seeks to Add Firearms Detection to 3D Printers

The proposed bill in Washington state seeks to require 3D printers to have built-in firearms detection technology, aiming to prevent the creation of untraceable 'ghost guns'. The bill aims to address concerns over the potential use of 3D printing for the manufacture of unregistered and unserialized firearms.

hackaday.com
8 1
Summary
Bender about 1 hour ago

Should professors be forced to retire?

nature.com
1 0
oldivygames about 1 hour ago

I wrote a TTF-to-Texture function for SDL2

For anyone interested in SDL2, who prefers textures to surfaces, but for text rendering is stuck with SDL2_ttf which only uses surfaces.

I made a video about it here (hope it's OK to share this here):

https://www.youtube.com/watch?v=W8-06L0e_2U

2 0
molly_radstowe about 1 hour ago

OpenStreetMap overwhelmed by bots scraping data

twitter.com
11 5
Show HN: Chrome extension to keep Trello columns always visible across tabs
Ryanwalker64 about 1 hour ago

Show HN: Chrome extension to keep Trello columns always visible across tabs

The Focus Column for Trello browser extension adds a customizable column to Trello boards, allowing users to prioritize and focus on their most important tasks.

chromewebstore.google.com
1 1
Summary
Tesla lands major Semi charging deal with nation's largest truck stop operator
Bender about 1 hour ago

Tesla lands major Semi charging deal with nation's largest truck stop operator

Tesla has secured a deal with the largest truck stop operator in the United States to install its Semi charging stations at their locations, a significant step in expanding the infrastructure for Tesla's all-electric semi-trucks.

electrek.co
4 0
Summary
pseudolus about 1 hour ago

Who is using AI to code? Global diffusion and impact of generative AI

The article explores the potential benefits of gene editing in agriculture, highlighting advances in CRISPR technology and its application to crop improvement. It discusses the growing regulatory landscape and explores the implications of these emerging technologies for food security and sustainability.

science.org
1 0
Summary
Lendy – Keep track of books you have lended in a beautiful way
viraatdas about 1 hour ago

Lendy – Keep track of books you have lended in a beautiful way

The article discusses the concept of 'Lendy', a new type of lending platform that aims to provide a decentralized, trustless, and transparent approach to peer-to-peer lending. It explores the potential benefits and challenges of this blockchain-based lending system.

lendy.viraat.dev
2 1
Summary