Show HN: Aft, a Python toolkit to study agent behavior

aft was my stab at having a way to understand what claude is doing and also having the language to reason about differences in model behavior when we make them do long agentic runs / change prompts / alter tools etc. The intention of the toolkit to provide an empirical measure of how agent behavior can differ as things changes like environments, tools, prompts etc.

It gives the tools to measure the changes in "behaviors that the users define". This means that it is more like a hypothesis testing framework for what the agent is doing over actually telling what the agent might do.

The reasoning and derivations behind these tools is given over here https://technoyoda.github.io/agent-science.html

Would be very happy to hear feedback and questions. (Please ignore the names given to theorization, it was for shits and giggles)

Summary

The article introduces 'aft', a lightweight and extensible framework for building and testing cloud-native applications. It covers the key features of aft, including its modular design, support for multiple cloud providers, and ability to handle infrastructure-as-code and deployment workflows.

Story

Show HN: Aft, a Python toolkit to study agent behavior