Story

Show HN: 30k IKEA items in flat text

tsazan Wednesday, January 07, 2026

OP here.

I took the unofficial IKEA US dataset (originally scraped by jeffreyszhou) and converted all 30,511 products into a flat, markdown-like protocol called CommerceTXT.

The goal: See if a flatter structure is more efficient for LLM context windows.

The results: - Size: 30k products across 632 categories. - Efficiency: The text version uses ~24% fewer tokens (3.6M saved total) compared to the equivalent minified JSON. - Structure: Files are organized in folders (e.g. /products/category/), which helps with testing hierarchical retrieval routers.

The link goes to the dataset on Hugging Face which has the full benchmarks.

Parser code is here: https://github.com/commercetxt/commercetxt

Happy to answer questions about the conversion logic!

Summary
This dataset contains product catalog information and customer reviews for IKEA products sold in the United States. The data includes product details, pricing, customer ratings, and reviews, providing insights into IKEA's product offerings and customer experiences in the US market.
47 33
Summary
huggingface.co
Visit article Read on Hacker News Comments 33