this is the only article you'll need to understand how claude agent SDK actually works.
no you don't need a mac mini for this.
let's jump right into it.
@bcherny's side project(now claude code) has basically changed the lives of developers as much as cursor did. we went from running 1 agent assisting us to now 15 agents running while we sleep.
how did we come to this? what did claude do amazingly well that enabled this? how can you use the same principles to build your own agents?
we have made a lot of progress and have finally gotten ourselves to a plan that works for agents:
gather context -> take actions -> verify it works
step 1: gathering context
imagine you are an engineer(you most likely are), you have been given a task to fix a bug. what's your step 1 pre cursor era?
see what the error is
trace it to a file
see surrounding files to figure out what the bug actually is
possible add logging to pinpoint the bug
fix it or create a jira ticket for 20 points
step 1-3 is basically gathering context. you are gonna go to sentry to see the logs, you are going to your code to see the error part, you are gonna command click to see where this originated and so on.
we started solving this for agents first via providing all code files(if its a small repo) or adding files(if its a big repo). it worked well for a while but it didn't seem agentic. it was us basically gathering the context for the agent.
then we thought, why not index the code and let the agent decide what to get from the indexed code. that worked well for a while but we realised its mostly not getting the right parts and creating slop.
then we thought, what if we could give agents tools to get the context the way we do and let it rip. smart idea. started well. still working well. only 1 problem:
context bloat. if you get everything. keep everything in context. there is too much in context. your context is bloated. your smart models generate slop.
claude solved this by using(and prolly making them famous) bash tools and files.
bash is extremely powerful. you can do anything that a human does just through your bash. besides complaining about the models ofc.
one good thing about gathering context is that it can happen in parallel. you can call multiple bash tools, get context, write it to files, see what's relevant, use it and that's it.
the files are not auto sent in the context but the agents have access to it and can refer it anytime.
that's the current state of claude agent SDK in gathering context:
provide all the tools that humans have as tools through bash(preferable) but also API(through functions)
let the agents decide which tool to call and keep track of everything in files which aren't being auto sent to the agent.
one good rule of thumb is to think if you can convert the data sources to something the agent knows really well like a csv to sql db then its perfect because then it can gather context whenever necessary
designing tools for gathering context is a really creative process tbh. you need to basically think how you do your job, how it can be done by an agent, how can the agent do it faster and that's it.
subagents are also a very good way to manage context: best for "do a lot of work and return an answer"
perfect, what's next?
step 2: taking action
action is basically a tool call or a function call. we have made a lot of progress in here as well.
it started with getting classification words from agents. doing if elses for each word and assigning it to a function. worked well for some time. but who really likes 10 if else's(if you do, you are a psycho)
then we introduced tool calls. a schema in which you define the tools and their description along with their schema and then agents make those tool calls. worked well for some time but felt really slow.
then we introduced subagents. a subagent is a worker agent for the main orchestrator agent to do multiple such tool calls for multiple such prompts at once. worked well for some time but wasn't really dependable as agents started calling multiple subagents and the tree grew to infinity.
then we were like okay what if we limit the depth of subagents. and only orchestrator agents can create subagents. worked well for some time and still does. but we were still struggling with agents making wrong decisions in one off tasks.
then we thought: can we make the agents learn how to make specific decisions in a workflow?
enter skills.md
skills are an organizable collection of files: instructions, executable code, assets. its a way to give out of distribution tasks some more context.
the way to design skills acc to @trq212 is to read what your agent is doing. how can i help it do it better? how can i help it do it faster?
still very manual. still very human. request for startup 2027?
one very important thing if you are using the bash way to do tool calling is to make sure all your commands should have a —help or something so its apparent how you can call that tool.
don't bloat the descriptions too much. its also advisable to mention to the agent: when not to call the tool. works pretty well.
let's jump to the next step.
Step 3: verify it works
we didn't have any verification methods until very recently. that's what humans were doing. giving logs, screenshots and making sure it works.
sandboxing has now become a unique way of solving this problem. you are giving an agent a computer of its own and it can basically do anything you can(but better: doom's day soon)
but before you let it rip a chrome tab and do actions of its own, try seeing if you can verify it through a rule.
think about what tasks agents have really gotten good at? verifiable tasks. hello RLHF. if its a code output, see if it compiles, see if it solves the user problem. try verifying as much as you can.
you can honestly never verify too much. its also a good practice to make things reversible. checkpointing by cursor was the first very cool step in this direction. obviously you have git but think fast reversing whatever the agent has done.
when you are thinking about verification, think again as how would a human do this and try to emulate it.
that's all the steps claude agent SDK uses. they have really nice documentation too. thanks to Thariq for his talk on agent SDK, really liked the clarity.
go have fun nerds.