Daily top stories
cuneicode, and the Future of Text in C
FF4J – Feature Flags for Java
Royal Navy says quantum navigation test a success
“csinc”, the AArch64 instruction you didn’t know you wanted
Run and create custom ChatGPT-like bots with OpenChat
Beware the Man of Many Studies
Notes on Vision Pro
Open Source Business Challenges and Reality, Rui Ueyama (Creator of Mold Linker)
DeepFilterNet: Noise supression using deep filtering
Bees can learn, remember, think and make decisions
Show HN: Arroyo – Write SQL on streaming data
Arroyo is a modern, open-source stream processing engine, that lets anyone write complex queries on event streams just by writing SQL—windowing, aggregating, and joining events with sub-second latency.
Today data processing typically happens in batch data warehouses like BigQuery and Snowflake despite the fact that most of the data is coming in as streams. Data teams have to build complex orchestration systems to handle late-arriving data and job failures while trying to minimize latency. Stream processing offers an alternative approach, where the query is compiled into a streaming program that constantly updates as new data comes in, providing low-latency results as soon as the data is available.
I started the Arroyo project after spending the past five years building real-time platforms at Lyft and Splunk. I saw first hand how hard it is for users to build correct, reliable pipelines on top of existing systems like Flink and Spark Streaming, and how hard those pipelines are to operate for infra teams. I saw the need for a new system that would be easy enough for any data team to adopt, built on modern foundations and with the lessons of the past decade of research and industry development.
Arroyo works by taking SQL queries and compiling them into an optimized streaming dataflow program, a distributed DAG of computation with nodes that read from sources (like Kafka), perform stateful computations, and eventually write results to sinks. That state is consistently snapshotted using a variation of the Chandy-Lamport checkpointing algorithm for fault-tolerance and to enable fast rescaling and updates of the pipelines. The entire system is easy to self-host on Kubernetes and Nomad.
See it in action here: https://www.youtube.com/watch?v=X1Nv0gQy9TA or follow the getting started guide (https://doc.arroyo.dev/getting-started) to run it locally.
Why it is time to start thinking of games as databases
The growing pains of database architecture
How to Grow a Three Sisters Garden (2016)
Aptible (YC S14) is hiring a security engineer
Launch HN: Seam (YC S20) – API for IoT Devices
We started Seam out of our frustration with the challenges of integrating IoT devices with software apps.
For example, my co-founder Dawn led Sonder's efforts to integrate smartlocks with their reservation systems in order to automate access for guests. She struggled with poorly documented and unreliable device APIs along the way. Our founding engineer Max authored the popular TuyAPI library and has spent countless hours trying to build sensible interfaces on top of unreliable devices.
For my part, I was an early engineer at Nest and saw firsthand how manufacturers often lack the resources and motivation to support third-party developers.
As a result, most devices lack public APIs, and getting access to the private ones (if they exist) requires lengthy negotiations with manufacturers. This task grows in complexity with each additional device brand a developer may need to integrate.
Seam serves as a single API that works across dozens of brands and hundreds of devices.
We start by testing each device in our hardware lab in San Francisco. We study their behaviors & quirks, and faithfully reproduce those in our development sandbox. We take time to craft custom client libraries that maximize developer ergonomics while accounting for the asynchronous nature of the devices. We offer pre-built UI components (React, Web-native…etc) to let developers rapidly assemble complex UIs that can manage large fleets of devices. And we even have a small hardware gateway to connect on-prem and legacy devices.
A few app developers like Guesty (YC S14) already use Seam to connect to their end users’ devices. We have a generous free tier and charge a small fee for additional devices. We work closely with manufacturers to improve device reliability, add OAuth support, and patch security holes. We also spend time educating them on the importance of supporting open source projects like Home Assistant, OpenHab...etc and we will be contributing some of our own integrations to those ecosystems.
Seam is still very much a work in progress with many aspects that need to be improved. But our hope is that it will help push IoT devices from being (mostly) point solutions, to becoming a set of API endpoints software engineers can tap to interact with the physical world.
GGML – AI at the Edge
A species of deep-sea squid has the world’s biggest light-producing organs
The Birth of the Grid
The History of CUDA
From SVG to Canvas – A new way of building interactions
When First Principles Thinking Fails
Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
Ask HN: Code review as part of the peer review process?
How much of the 'crisis of reproducibility' is caused by code that wouldn't pass the smell test, let alone a real code review?
DirectX 12 Support on macOS
Show HN: Serverless OLAP with Seafowl and GCP
A recurring problem I've faced with side projects is the need for Postgres, but no desire to deploy or maintain new instances. So when I learned GCP's "always free" tier includes serverless [1] I got curious to see if I could run a database.
While a lot of classic databases aren't usually a great fit for serverless, Seafowl separates compute, storage and catalog (catalog == a SQLite file of metadata). [2] Last month I was able to introduce GCS bucket compat to Seafowl, which enabled me to mount the catalog via gcsfuse (i.e. an adapter that allows attaching GCS buckets to local filesystems). Upshot: while FUSE does add HTTP requests to container startup, init time remains comparatively quick, even cold starts, because fetching is limited to the single catalog SQLite file only.
With this approach you get a URL you can query directly from your FE if you want, e.g. fetch() can send SELECT * ... queries straight from your users' browser. You could plot a graph from a static React frontend, or observablehq.com editor, with no persistent backend needed. So at times when nobody's using your app, 100% of your stack can scale to zero with obvious cloud spend advantages. And even if you exceed free tier limits, being PAYG offers a good chance you'll come out ahead on hosting costs anyway.
NB: Seafowl is an early stage project, so it's not really suitable if you need transactions or fast single-row writes. Otherwise, this could be a nice way to get free database hosting at a big 3 cloud provider, especially for e.g. read-only analytical reporting queries.
Feedback and suggestions are appreciated. Hope it helps you! More available if you want [3].
[0] https://seafowl.io/docs/getting-started/introduction
[1] https://cloud.google.com/run/pricing#cpu-requests
[2] Neon is another interesting project that separates compute and storage. https://neon.tech/blog/architecture-decisions-in-neon
One issue I observed was a noticeably longer startup time vs this FUSE approach, which I believe may be related to Postgres connection setup time/roundtrips. Looking forward to trying Neon again in future.
[3] https://www.splitgraph.com/blog/deploying-serverless-seafowl
Scripts for Btrfs Maintenance
Apple's game porting toolkit is fantastic. Cyberpunk 2077 at Ultra on an M1 MBP
Ownership in Swift: Manifesto and meta-proposal (2017)
TelaMalloc: Efficient On-Chip Memory Allocation for Production ML Accelerators
You're all caught up
Don't spend all your valuable time here, life is more important Content refreshes every hour, on the hour