nobody is gonna applaud you when you run out of money because you built agents without thinking about tokens.
if you are building a company which makes any calls to LLMs(gpt, claude, grok), you already know how expensive API is(unlike coding subscriptions)
i've built agents for a long time and here are 3 things that you might be doing wrong burning the $$$ unnecessarily(in inverted order)
3. using a model everywhere: LLMs are smart but not deterministic. often times you can complete your tasks with regex and deterministic code without burning through tokens. its important that you do so not just because you would save money but also the customer experience would be much better with deterministic flows.
2. using the same model everywhere: no, you don't need opus 4.6 to decide whether the image is of a dog or a cat. it is very easy to make all the api calls to a single model but its wrong to do so. the amount of reasoning(and token burn) each model does is very different and is built for different purposes. bonus if you play around with reasoning param as well.
3. prompt caching: but first what exactly is prompt caching? put simply and non technically, any LLM does two things when you present it with a prompt:
a) go through the prompt and do some magic algebra
b) generate the next tokens
now everytime you present it with a new prompt, it has to go through a) and b). when you have a large system prompt, a) is pretty expensive and redundant. how prompt caching works is that you cache the going through the prompt part everytime you go through something and use it when you get the same prefix.
"but i never change my system prompt so i'm already using it"
no
you might be adding the datetime object in your system prompt so everytime the message goes, it would have a different datetime object and thus no cache hits
or you might be adding custom tool instructions but only when those tools are accessible and not all tools are accessible at all times(which is wrong for other reasons too) and thus you never get the cache hit.
there are a lot of such nuances in agent building where you might be spending a lot of money unnecessarily. dm for a free discovery and i'd be happy to help you out to figure those things out.
that's all nerds.