Story

Show HN: MyDuck Server – Supercharge MySQL and Postgres Analytics with DuckDB

fanyang01 Thursday, November 28, 2024

Hello HN!

We're excited to announce MyDuck Server, an open-source project that seamlessly integrates the analytical power of DuckDB with your existing MySQL & Postgres databases.

*Backstory*

Currently, there are no fully satisfactory open-source OLAP solutions for MySQL & Postgres. In the MySQL ecosystem, HeatWave offers close integration, but it's a proprietary, commercial product from Oracle. The Postgres community has seen promising DuckDB-based extensions emerge, including the official pg_duckdb. However, extensions can introduce isolation concerns in mission-critical environments.

Consequently, many organizations resort to setting up complex and costly data movement pipelines using tools like Debezium, Flink, or other commercial solutions to replicate data from MySQL & Postgres to OLAP systems (e.g., Snowflake, BigQuery, ClickHouse) or Lakehouses (e.g., Delta Lake + Spark). This approach introduces significant operational overhead and expense.

Another emerging strategy is the zero-ETL approach, increasingly advocated by cloud providers. This model simplifies data integration by allowing the cloud provider to manage ETL pipelines, while necessitating reliance on specific cloud ecosystems and services.

*Key features*

MyDuck Server offers a real-time analytical replica that leverages DuckDB's native columnar storage and processing capabilities. It operates as a separate server, ensuring isolation and minimizing impact on your primary database. Key features include:

- Easy Zero-ETL: Built-in real-time replication from MySQL & Postgres with no complex pipelines to manage. It feels like a standard MySQL replica or Postgres standby. With the Docker image, passing a connection string is enough.

- MySQL & Postgres Protocol Compatibility: We take this seriously and are working to make this project integrate well with the existing ecosystem around MySQL & Postgres. Currently, it is already possible to connect to MyDuck with standard MySQL & PostgreSQL clients in many programming languages.

- HTAP Support: A standard database proxy can be deployed in front of a MySQL/Postgres primary and its MyDuck replica to route write operations to the primary and read operations to the replica. It just works.

- DuckDB SQL & Columnar I/O over Postgres Protocol: It's unnecessary to restrict ourselves to MySQL/Postgres's SQL expressiveness and row-oriented data transfer. The Postgres port accepts all DuckDB-valid SQL queries, and you can retrieve query results in columnar format via `COPY (SELECT ...) TO STDOUT (FORMAT parquet/arrow)`.

- Standalone Mode: It does not need to be run as a replica. It can also act as a primary server that brings DuckDB into server mode and accepts updates from multiple connections, breaking DuckDB's single-process limitation.

*Relevant Previous HN Threads*

- pg_duckdb [1] (https://news.ycombinator.com/item?id=41275751) is the official Postgres extension for DuckDB. It uses DuckDB as an execution engine to accelerate analytical queries by scanning Postgres tables directly.

- pg_mooncake [2] (https://news.ycombinator.com/item?id=41998247) is a Postgres extension that adds columnstore tables for PG. It uses pg_duckdb under the hood but stores data in Lakehouse formats (Iceberg & Delta Lake).

- BemiDB [3] (https://news.ycombinator.com/item?id=42078067) is also a DuckDB-based Postgres replica. Unlike us, they focus on storing data in Lakehouse format.

We believe MyDuck Server offers a compelling solution for those seeking high-performance analytics on their MySQL & Postgres data without the complexities and costs of traditional approaches. We're eager to hear your feedback and answer any questions you might have. Let me know what you think!

[0] https://github.com/apecloud/myduckserver

[1] https://github.com/duckdb/pg_duckdb

[2] https://github.com/Mooncake-Labs/pg_mooncake

[3] https://github.com/BemiHQ/BemiDB

Summary
The linked article is about a simple and lightweight web server called 'myduckserver' that is built using Go. It provides a simple and easy-to-use interface for serving static files and handling HTTP requests. The server is designed to be cross-platform and can run on various operating systems, including Windows, macOS, and Linux. The article includes instructions on how to install and use the server, as well as examples of how to customize its behavior.
31 6
Summary
github.com
Visit article Read on Hacker News Comments 6