Show HN: Zagora, Distributed fine-tuning platform on mixed GPUs over internet
miyamotomusashi Sunday, March 01, 2026I built Zagora, a distributed fine-tuning platform that turns fragmented or mixed GPUs into a unified training cluster over standard internet (1Gbps).
The problem:
Most distributed training assumes homogeneous GPUs and high-bandwidth interconnects (NVLink/InfiniBand). On heterogeneous fleets over standard internet, tensor/data parallel approaches become communication-bound and fragile.
What Zagora does under the hood:
- Uses pipeline-style parallelism instead of heavy tensor synchronization.
- Passes only boundary activations between stages rather than full parameter sync.
- Assigns layers proportionally to GPU capability to reduce straggler idle time.
- Uses checkpoint-based recovery to tolerate worker crashes.
- Supports adapter-based fine-tuning (e.g., QLoRA) to reduce memory pressure.
Zagora currently supports managed runs (we provision GPUs in-region) and a BYOC mode where users run workers on their own infrastructure.
Limitations:
- Full-parameter fine-tuning is not supported yet.
- It won't beat an NVLink cluster on raw throughput.
- Cross-region training is still latency-sensitive.
- Heterogeneous nodes scheduling is an ongoing tuning problem.
IMPORTANT:
I'm currently running jobs manually, so it may take some time before training starts. However, I will run every submitted job.
Link: app.zagora.ai
I'd be interested in feedback from people who've worked on distributed training at scale.
Happy to answer technical questions.