@e12e 10d
From https://github.com/ClickHouse/ClickHouse/issues/22482#issuec... it looks like a local load into clickhouse is expected to take 6-7 hours (in 2017?).

I wonder how clickhouse-local would fare today (I'm guessing the dataset is so big, that load/store - then analyze would be better....).

@xk3 10d
zstd decompression should almost always be very fast. It's faster to decompress than DEFLATE or LZ4 in all the benchmarks that I've seen.

you might be interested in converting the pushshift data to parquet. Using octosql I'm able to query the submissions data (from the begining of reddit to Sept 2022) in about 10 min

https://github.com/chapmanjacobd/reddit_mining#how-was-this-...

Although if you're sending the data to postgres or BigQuery you can probably get better query performance via indexes or parallelism.