Ask HN: For those of you building AI agents, how have you made them faster?

Because of the coordination across multiple systems + chaining LLM calls, a lot of agents today can feel really slow. I would love to know how others are tackling this:

- How are you all identifying performance bottlenecks in agents?

- What types of changes have gotten you the biggest speedups?

For us we vibe-coded a profiler to identify slow LLM calls - sometimes we could then switch out a faster model for that step or we'd realize we could shrink the input tokens by eliminating unnecessary context. For steps requiring external access (browser usage, API calls), we've moved to fast start external containers + thread pools for parallelization. We've also experimented some with UI changes to mask some of the latency.

What other performance enhancing techniques are people using?

Story

Ask HN: For those of you building AI agents, how have you made them faster?