Agents that run while I sleep
The article discusses the author's journey in developing autonomous software agents that can run and perform tasks while the user is away, allowing for more efficient and hands-off workflow management.
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.
Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.
To get started:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads ~1 GB of models
rcli # interactive mode with push-to-talk
Or: curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
The numbers (M4 Max, 64 GB, reproducible via `rcli bench`):LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms
STT – 70 seconds of audio transcribed in *101 ms*. That's 714x real-time. 4.6x faster than mlx-whisper.
TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.
We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.
The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.
We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.
MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:
LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...
Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...
How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.
Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...
RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.
Source: https://github.com/RunanywhereAI/RCLI (MIT)
Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg
What would you build if on-device AI were genuinely as fast as cloud?
RISC-V Is Sloooow
The article discusses the performance of RISC-V processors, noting that they can be significantly slower than other architectures, particularly in certain workloads. The author provides insights into the factors that can contribute to this performance difference and suggests areas for further optimization.
Universal vaccine against respiratory infections and allergens
Researchers at Stanford University have developed a universal vaccine that could protect against a wide range of influenza viruses, potentially providing long-lasting immunity and reducing the need for annual flu shots.
Throwing away 18 months of code and starting over
The article discusses the decision to rewrite a codebase from scratch after over a year of development, highlighting the benefits of starting anew despite the significant time and effort invested in the previous iteration.
Mesh over Bluetooth LE, TCP, or Reticulum
Columba is an open-source project that aims to develop a secure and privacy-focused messaging platform. The project focuses on building decentralized infrastructure, end-to-end encryption, and user privacy as key principles.
Widevine retiring its Cloud License Service (CLS)
Google is retiring the Widevine cloud license service, which allows content providers to manage and deliver DRM-protected content. This change will impact content providers who rely on the Widevine cloud service, requiring them to migrate to alternative solutions or self-host the Widevine license service.
Networking with agents: Put them in the right conversations with Tailscale
The article discusses strategies for networking with sales agents, including identifying the right agents to connect with, initiating conversations, and building relationships that benefit both parties. It provides practical tips for effectively engaging with agents and leveraging these connections to drive business growth.
$3 ChromeOS Flex stick will revive old and outdated computers
A new ChromeOS stick allows users to transform old or outdated computers into modern, cloud-connected devices, providing a cost-effective solution for extending the life of legacy hardware.
DOGE employee stole Social Security data and put it on a thumb drive
A Doge employee allegedly stole sensitive personal data, including Social Security numbers, and stored it on a thumb drive, according to a recent report. The incident has raised concerns about data security and the protection of sensitive information within the company.
The U.S. borrowed $50B a week for the past five months, the CBO says
The US Treasury is warning that the government may reach its borrowing limit in as little as five months, as the federal budget deficit continues to grow. This could force the government to take extraordinary measures to continue paying its obligations, raising concerns about the country's fiscal situation.
Russia's deportation of Ukrainian children amounts to crime against humanity
The article discusses the potential risks of artificial intelligence (AI) as it becomes more advanced, including concerns about job displacement, the spread of misinformation, and the ethical implications of AI decision-making. Experts emphasize the need for responsible development and governance of AI technologies to mitigate these challenges.
It's time to speak out against unchecked growth of satellite mega constellations
The article discusses the growing concerns about the rampant expansion of satellite mega-constellations, which threatens to obstruct the night sky and significantly impact astronomical observations, as well as the natural environment.
Flock Flocked up: How a license plate camera misread unraveled one man's life
Flock Safety's automated license plate reading (ALPR) cameras have been found to misread license plates, leading to false alerts and potential privacy concerns. The article examines the accuracy and privacy implications of this technology used by law enforcement and private communities.
Show HN: A modern React onboarding tour library
react-tourlight is the modern React tour library. Zero dependencies, WCAG 2.1 AA accessible, under 5 kB gzipped. The one that works with React 19.
U.S. Global War on Terror Has Taken Nearly 1M Lives (2021)
The article examines the human and financial costs of the United States' post-9/11 'war on terror,' estimating that over 900,000 people have died as a direct result, and the total price tag has reached $8 trillion since 2001.
Hugging Face Storage Buckets: Mutable, non-versioned object storage at $12/TB
The article discusses the importance of storage buckets in managing and organizing data, particularly in the context of machine learning and AI projects. It explores the key features and benefits of using storage buckets, including scalability, cost-effectiveness, and data security.
America's never had such high national debt heading into an economic shock
A new report from the Committee for a Responsible Federal Budget warns that the national debt reaching 100% of GDP by 2026 could trigger a fiscal crisis, and calls for a 'break-glass' plan to address the issue, including spending cuts and tax increases.
Pete Hegseth Blew Billions on Fruit Basket Stands, Chairs, and Crab
The article examines the business practices of Pete Hegseth, a co-CEO of Scribe Media, and his company's involvement in the controversies surrounding the Trump administration and right-wing politics. It delves into Hegseth's personal wealth and his company's role in conservative media and political initiatives.
Air strikes cause black rain and 'unprecedented' pollution in Tehran
The article discusses the findings of a new study that suggests the origin of the coronavirus pandemic may have occurred earlier than previously thought, with signs of the virus circulating in Italy as early as September 2019, predating the first known cases in Wuhan, China. The study highlights the need for further research to better understand the timeline and spread of the COVID-19 outbreak.