What are AI agents? A Practical Guide to Understanding Agentic Systems
AI agents explained. In detail. For normal people.
Hey there! 👋 I'm Alfie, Co-Founder & CEO of Toolflow AI, a platform where you can build your own workforce of AI agents and tools, without writing a single line of code.
On this Substack I'll be sharing everything we learn along our journey, breaking down complex AI topics into simple language, and sharing the ups and downs of building a startup. If you need anything, just shoot me an email at alfie@toolflow.ai
What is an AI Agent?
We're witnessing the biggest transformation in how work gets done since the industrial revolution.
The era of AI agents has arrived.
The entire technology sector makes up just 10% of US GDP ($2.7 Trillion), while the human labor sector accounts for 50% – a staggering five times larger.
This comparison is crucial because it demonstrates the scale of potential disruption.
For decades, software has been incrementally transforming human work – automating repetitive tasks, streamlining workflows, and enhancing efficiency. But fundamentally, humans were still “doing” the work.
That paradigm is shifting, dramatically.
AI agents are emerging as a new class of digital worker. These aren't merely tools that help humans work better – they're autonomous systems that can understand objectives, make decisions, and execute tasks independently.
The key difference is: AI agents represent labor rather than traditional software.
We're already seeing this transformation in action. Companies like AISDR and 11x are deploying agents to manage complex sales processes, while Devin, an AI developer, writes code with capabilities matching human engineers.
So, what are AI agents?
An AI agent is software that uses large language models (LLMs) to autonomously accomplish tasks by understanding goals, planning steps, and choosing how and when to use tools and information. Unlike traditional software that executes predefined steps, agents can adapt their approach and make independent decisions in real-time. In other words, they have agency.
If AI agents can perform work comparably to humans, the opportunity extends far beyond the technology sector's 10% of GDP – we're looking at augmenting the human labor market, which makes up 50% of GDP, through an infinitely scalable AI workforce.
This transformation will reshape our entire economy. To understand these far-reaching implications, we first need to understand how AI agents work
The Building Blocks of AI Agents
To understand how AI agents can transform work, we need to look under the hood.
Just as human workers need specific capabilities to get their jobs done – the ability to think, act, remember, and learn – AI agents are also built from essential components that work together to create an effective digital worker.
Think of an AI agent like a human employee.
They need, a brain to process information and make decisions, hands to take action and get work done, knowledge about your company and processes, memory to learn from experience and store progress, and systems and tools to work effectively.
Here's what this looks like in practice:
🧠 The Cognitive Layer (LLMs): The brain that does the thinking and planning
💪 The Action Layer (Tools:) The hands that get work done
💡 The Context Layer (Knowledge): The information that informs decisions
💭 The State Layer (Memory): The ability to remember and learn from experience
🖥️ The Infrastructure Layer (Architecture): The nervous system that connects everything together
Let's explore each of these components and see how they combine to create agents as effective as human labor.
The Cognitive Layer (LLMs)
Think of Large Language Models as the brain of an AI agent.
Like our human brains draw on life experiences to understand and respond to the world, LLMs use their training data to make sense of and respond to questions.
If you've never learned French, you can't speak it. Similarly, if an LLM wasn't exposed to certain data it will have a hard time responding to questions about it. That’s why we need to enhance our cognitive layer with knowledge (more on that later).
But LLMs do more than just answer questions.
They can "think" about how to solve problems and make decisions too. When you equip LLMs with tools, knowledge, memory, and the ability to think, they can:
Understand a user’s goal
Plan out steps to achieve it
Choose and execute tools to make progress
Decide to delegate tasks to other agents or software
Store and remember information relevant to the task
The ability to think and make decisions well varies based on the quality of the underlying model used. Historically, LLM models have struggled to think logically and reason, however OpenAI recently released GPT-o1, which excels at this kind of systematic, deliberate thinking.
The better these models get at reasoning, the better the agentic systems we can create.
The Action Layer (Tools)
A smart human without any ability to take action is just a consultant. The same is true for AI agents – they need tools to transform their thinking into doing.
You're probably familiar with how ChatGPT works – you can have fascinating conversations with it, but ultimately it can't actually do anything in the real world. It can't send emails, look up real-time information, or update your CRM. It's all talk, no action.
This is where tools come in.
Tools are what transform a conversational AI into an AI agent that can get work done. They're the bridge between thinking and doing.
In reality, tools are just pieces of code or API calls your agent can make that give it the ability to affect the real world. These API calls fall into two main categories:
1. Data APIs: Help agents understand the world These tools pull in external information to help agents make informed decisions:
Weather feeds → Get real-time weather data
SEC public filings → Access company financial information
LinkedIn profile data → Gather professional and company insights
Market research feeds → Stay updated on industry trends
2. Connected Integrations: Let agents take action These tools allow agents to interact with your existing software systems:
Send Slack messages
Search Gmail inbox
Update Hubspot records
Create tickets in your support system
So how do LLMs use tools?
It's actually quite simple. We provide the LLM a list of tools and instructions for using them via something called a “JSON schema”. The schema explains what each tool does and what format it expects to receive inputs.
When the LLM receives a query, it decides whether to use a tool by examining the tool's description and reasoning about what it needs to accomplish.
Here's a concrete example:
Tool name: "Google Search"
Tool description: "Can search Google and return the SERP results with URLs based on a search query"
Tool input(s): "Search query"
If this agent was asked to find Salesforce's pricing page, it would reason: "I need to find Salesforce's pricing page. I can use Google Search to find this." It would then use the tool with a query like "site:salesforce.com pricing" and get back a list of relevant URLs.
Agents' ability to reason and use tools like humans is what makes them comparable to "human labor". How well they decide to use tools is another question. This depends on the LLM's reasoning capabilities and will be determined by how good the model is.
The Context Layer (Knowledge)
LLMs face a fundamental limitation: they only know what they were trained on.
When their training data cuts off (like GPT-4o's October 2023 cutoff), they can't naturally know about newer events after that date, nor are they aware of information that they don’t have access to, like your company’s private data.
If LLMs form the "brain" of an agent then the access to relevant knowledge is the context.
That's why the knowledge layer is so important. To ground an agent with contextual knowledge and improve their ability to respond to queries.
There are several ways to give agents additional knowledge:
Direct Instructions in the message thread The simplest approach is sending context directly in the message sent to the LLM. Think of it like briefing a new employee - you tell them what they need to know for the task at hand.
While straightforward, this method quickly hits limitations. LLMs have a fixed context window (the amount of text they can consider at once). This means you can't just dump your entire company wiki into the conversation. Even if you tried, the more context you add, the more expensive each interaction becomes since you're paying to process that context every single time.
RAG (Retrieval Augmented Generation) A more sophisticated approach is RAG, which lets agents dynamically access external information sources. Instead of cramming all context into the message thread, RAG allows agents to search through your company's documents, workspaces, and databases and sends only relevant chunks of information over.
For example, an agent using RAG could search through your Notion workspace, Google Drive, CRM data, and internal wikis to pull exactly the information needed for each task.
It will pull out specific “chunks” of information from the knowledge sources and insert them into the message thread sent to the LLM to enrich their response. This method enables you to give an LLM access to vast amounts of knowledge without overwhelming it with information in any single message.
Fine-tuning The most intensive approach is fine-tuning the model itself on your specific data. While this can create highly specialized agents, it's often unnecessary given the power of modern LLMs combined with RAG.
Fine-tuning is expensive, time-consuming, and can actually reduce an agent's general capabilities. Consider it only for highly specialized use cases where RAG isn't sufficient.
The knowledge layer transforms general purpose LLMs into highly context aware agents that truly understand your business.
The State Layer (Memory)
While LLMs provide the "thinking" and tools provide the "doing", there's something crucial we need to address: how do agents maintain context and remember what's happening across an entire agentic system?
You see, agentic systems are rarely just single conversations with a language model. They often involve multiple agents, performing various tasks, across multiple steps to achieve a common goal.
How do they all stay in sync? How do they communicate the “status” of the task.
Imagine you're running an multi-agent sales development team. You've got multiple agents working together - one researching companies, another qualifying leads, a third handling outreach, and a fourth managing ongoing conversations.
Your research agent discovers that a target company just raised funding. Great trigger for outreach! But for this to work effectively:
The qualification agent needs the research info to update lead scoring
The outreach agent needs to reference the research naturally in emails
The engagement agent needs context of historical conversations to write relevant follow-up
Just like a human sales team might use a CRM to check the status of the lead, read through Gmail to see historical conversations, and use Slack to check if any SDRs have previously reached out before, AI agents need systems to share information and maintain context too.
The only way to communicate with an LLM is through the message thread, but putting all this information in the thread of each agent might get overwhelming, hit token limits, and be overkill.
So how do we maintain all this state while working within the message thread constraint?
This is where state management and various forms of memory comes in. Think of it as the infrastructure that decides:
What information to store
Where to store it
When to inject it into message threads
How to keep everything in sync
There are many ways this can work, let’s take a look at a couple:
Message Thread Memory
The simplest approach is keeping everything in the conversation itself - similar to how ChatGPT remembers your chat history. For our sales team example we might just send each agent the following information:
System: Current lead status: Acme Corp
- Recently raised Series B ($40M)
- Previous contact: Initial email sent 3/1
- Next step: Follow-up scheduled 3/15
But this has clear limitations:
Token limits restrict how much history you can include
It's expensive to process the full history in every interaction
Context disappears when conversations end
Agents can't easily share context with each other
Just think, are you going to include all of the call recording transcripts you’ve had with a customer over their lifetime? Probably not. But there’s lot’s of useful information in there we might want to store in databases and call upon when necessary.
External State Management
This is where things get powerful. Instead of cramming everything into message threads, we can use external systems to store and manage state and knowledge that can be sent to the LLM at the right time:
Databases might store the lead status and interaction history
Vector stores could maintain the semantic memory of past calls and email threads
State machines can then coordinate sending information between agents
All these systems serve one purpose: getting the right information into to the LLM at the right time and making sure the wider agentic system stays in sync.
Now that we understand how agents think (LLMs), act (Tools), and remember (State), we need to look at how all these pieces come together. This is where the Infrastructure Layer becomes crucial.
The Infrastructure Layer (Architecture)
We've explored how agents think (LLMs), act (Tools), learn (Knowledge), and maintain state (Memory). But how do these pieces come together into cohesive systems?
The key lies in how The Infrastructure Layer, aka architecture.
When building agent systems, the fundamental question is: how much decision-making power do we give to the AI?
AI systems aren't binary – they're not either agentic or non-agentic. Instead, they exist on a spectrum based on how much of their operation is determined by AI decision-making versus predetermined steps.
Let's explore what this spectrum looks like in practice.
No Agency: Predetermined Flows A business wants to create custom call notes from Gong recordings using AI and update their Hubspot. They build a Zapier workflow: new recording → send to ChatGPT for summarization → create Hubspot note → notify sales rep on Slack.
While this uses AI, it's not an agentic system. Every action, including when to use AI, was decided beforehand during setup. The AI just follows these predefined instructions.
Minimal Agency: Some Decision Points We could build a basic agentic system for handling support tickets. The workflow might look like this:
New ticket arrives
Agent analyzes content and categorizes the issue
Agent determines priority
Agent selects relevant team
The key difference from a non-agentic system is how it handles categorization and routing. A traditional rule-based system needs predefined categories and routing rules – if a new type of issue appears, it won't know what to do.
An agent, however, can use its reasoning capabilities to understand and categorize any ticket on the fly, even ones it's never seen before. It can analyze the content, understand the core issue, and make an informed decision about who should handle it.
Medium Agency: Flexible Operation ChatGPT shows us what medium agency looks like. When you're having a conversation, it can independently decide to search the web for current information or run code to solve problems. It's not just following rules – it's choosing which tools to use based on what it thinks will help answer your question.
Building on this model, you could create a research assistant that:
Analyzes your research request
Decides which sources to check
Chooses when to use web search, data analysis, or other tools
Determines if it needs help from specialized agents
Adapts its approach based on what it discovers
Here, there are lots of moments of agentic decision making, but overall it’s following a set of constraints via the tools available and the instructions provided.
High Agency: Autonomous Operation At the highest end of the spectrum are highly autonomous multi-agent systems that are given an end objective but no instructions on how to solve it.
These systems have access to tools and knowledge but determine their own approach entirely.
They can:
Plan their own steps
Create and manage specialized sub-agents
Choose which tools to use
Continuously modify their strategy based on results
Self-reflect and review their own work
These systems require much more sophisticated orchestration because their needs for state, memory, and context are significantly higher. That's why frameworks like Crew AI and Langchain have emerged – they help manage the complex coordination needed for truly autonomous agent systems.
Practical Implementation Most businesses will end up combining different levels of agency. They might use predetermined workflows for critical processes like lead qualification, but let agents make significant decisions within those workflows.
For example, a sales system might:
Follow a structured qualification process
Let agents independently research prospects
Allow agents to choose outreach strategies
Enable coordination with content creation agents
Maintain basic guardrails and bring in humans-in-the-loop to confirm actions
The key is matching the level of agency to your needs. Critical business processes might need more structure and less agency to ensure consistency. Creative or investigative tasks benefit from higher agency to handle unexpected situations.
They can all be agentic systems, but might sit somewhere differently on the spectrum.
From Theory to Practice: Evaluating AI Agents
We've broken down how AI agents work - from their cognitive abilities to how they take action, from their knowledge to their memory, and how it all comes together. But not every AI product that calls itself an "agent" will have all these pieces, or have them working at the same level.
So how do you cut through the noise and understand what you're really getting? Ask yourself these questions:
Decision Making
Does it follow only predefined steps, or can it reason about how to solve problems?
Can it break down complex goals into smaller steps?
Can it adapt when things don't go as planned or can it only follow rules?
Capabilities & Actions
What tools does it have access to?
Can it choose which tools to use, or does it follow a fixed pattern?
Can it chain multiple tools together to accomplish more complex tasks?
Context & Knowledge
Can it access your company's information?
How does it handle information that wasn't in its training data?
Can it learn from new information you provide?
Memory & Learning
Does it maintain context across conversations, tasks, and steps?
Can it learn from past interactions?
If multiple agents are involved, do they share information effectively?
Autonomy
Does it need step-by-step instructions, or can it work autonomously?
Can it ask for help when needed?
How much human oversight does it require?
Remember: There's no "perfect" level for any of these capabilities. The right balance depends entirely on your needs. Sometimes you want an agent with limited autonomy but perfect reliability. Other times you might need an agent that can think creatively and work independently.
Here's the bottom line.
If you want to quickly tell if something is actually an AI agent, look for these three fundamental capabilities:
Independent Thinking: It can understand goals and plan how to achieve them, rather than just following preset steps
Ability to Act: It can actually do things in the real world through tools and integrations, not just have conversations
Adaptive Decision-Making: It can change its approach based on new information or unexpected situations
Put simply: If a system just uses AI to process information within a fixed workflow (like a Zapier automation that uses ChatGPT), it's not really an agent. A true AI agent can understand what needs to be done, decide how to do it, and actually take action - all while adapting its approach as needed.