Show HN: Ardage, a tool to build ArXiv Markdown datasets using natural language
hariharprasadd Thursday, November 20, 2025i built a fun little python package called ardage (ARxiv DAtaset GEnerator) that lets you generate markdown datasets of research papers using natural language queries, blazing fast. it's perfect for generating post-training datasets for llms, rag knowledge bases, and more.
you can install it with 'pip install ardage', and use it in interactive mode in the cli, use it directly in the cli with flags, or import the library into your own code and build with it!
demo: https://x.com/hariharprasadd/status/1991346557459841196?s=20
2
0
github.com