Hazumi News | Show HN: Dq – pipe-based CLI for querying CSV, JSON, Avro, and Parquet files

I'm a data engineer and exploring a data file from the terminal has always felt more painful than it should be for me. My usual flow involved some combination of avro-tools, opening the file in Excel or sheets, writing a quick Python script, using DataFusion CLI, or loading it into a database just to run one query. It works, but it's friction -- and it adds up when you're just trying to understand what's in a file or track down a bug in a pipeline.

A while ago I had this idea of a simple pipe-based CLI tool, like jq but for tabular data, that works across all these formats with a consistent syntax. I refined the idea over time into something I wanted to be genuinely simple and useful -- not a full query engine, just a sharp tool for exploration and debugging. I never got around to building it though. Last week, with AI tools actually being capable now, I finally did :)

I deliberately avoided SQL. For quick terminal work, the pipe-based composable style feels much more natural: you build up the query step by step, left to right, and each piece is obvious in isolation. SQL asks you to hold the whole structure in your head before you start typing.

  `dq 'sales.parquet | filter { amount > 1000 } | group category | reduce total = sum(amount), n = count() | remove grouped | sortd total | head 10'`

How it works technically: dq has a hand-written lexer and recursive descent parser that turns the query string into an AST, which is then evaluated against the file lazily where possible. Each operator (filter, select, group, reduce, etc.) is a pure transformation -- it takes a table in and returns a table out. This is what makes the pipe model work cleanly: operators are fully orthogonal and composable in any order.

It's written in Go -- single self-contained binary, 11MB, no runtime dependencies, installable via Homebrew. I'd love feedback specially from anyone who's felt the same friction.

Summary

The article discusses the DQ (Data Quality) project, an open-source tool for validating and monitoring data quality. It covers the key features of DQ, including data validation, data profiling, and data lineage tracking, to help organizations ensure the integrity and reliability of their data.

Story

Show HN: Dq – pipe-based CLI for querying CSV, JSON, Avro, and Parquet files