Learn how do AI agents work

AI has been going through a profound transformation. And, it will continue to do so in 2025 and beyond, undoubtedly. We have already moved from static single models and reactive systems to proactive, intelligent agents. By the way, what are these agents and how do they work?

To start explaining that, we have to look at the various shifts that generative AI has been through so far. In this article, we’ll focus on:

AI Evolution (major shifts that occurred)
How do AI agents work
Detailed examples to better understand AI agents
Why should you care about AI

This deep dive details the functionality and potential impact of AI, tracing its evolution from simple programs to multi-system agents.

Alright, let’s dive in.

The Evolution of AI Models

Shift from Monolithic Models to Compound AI Systems

“Monolith’ means a substance of one piece. It’s the same context in software, too. Monolithic models in AI refer to centralized, self-contained systems designed to handle multiple tasks through a single framework.

Once, the development of software was completely dominated by the monolithic architecture. But, now these are increasingly replaced by modern systems.

AI systems from 2020 to 2022, such as GPT-3, were monolithic models that used broad knowledge to perform different tasks. These systems processed inputs based on pre-trained patterns and did not integrate real-time external data.

If you had to train these models, it would take a significant investment in data, and resources. But scientists figured that when you build systems around the model, and integrate them into the existing processes, it would level up the tasks.

how do AI agents work — Source: Compound AI

Example - If you asked a single model to summarize a ‘customer support’ document, it will. But the moment you ask about a particular customer’s order status, it would fumble.

But, what if you could design a system to solve this problem? For that, you’d have to give the model access to a database where the customer data is stored. Now, the same query would go into the language model. The difference now is that the model would be prompted to create a search query, and that would be a search query that can go into the database that you have.

It would go and fetch the information from the database, give an answer, and then that would go back into the model that can generate a sentence to answer. This is an example of a compound AI system, and it recognizes that certain problems are better solved when you apply the principles of system design.

Retrieval augmented generation (RAG) model is one of the most popular and commonly used compound AI systems out there.

What is RAG?

Large Language Models, inherently, have two major challenges:

The response ‘generated’ by the model has no source to support its answer.
And, the answer may be out of date.

Now, how do you solve this issue? Through RAG.

If you add a content store, like an open internet, to the LLM, the model won’t output an answer from its training data. Instead, it will first go to that store and ask for an answer to the user's query.

Why? Because in RAG, the model is instructed to first go and ask the data store for relevant content. Then, combine that with the user's question and only then generate the answer.

So, you now see the ‘retrieval-augmented’ part in RAG. You can augment your data store with new and updated information anytime, without needing to retrain the model. Since the model is instructed to rely on the primary source of data, it can give evidence for the output.

This makes it less likely to hallucinate because it is less likely to rely only on information that it learned during training.

What if the user's question cannot be reliably answered based on the data store? Then, the model should say, "I don't know," instead of making up something that is believable and may mislead the user.

This can have a negative effect as well. If the retriever is not sufficiently good to give the model the best, most high-quality information, then maybe the user's query that is answerable doesn't get an answer.

Researchers and engineers are working to improve the retriever to give the LLM the best quality data on which to ground its response. Along with that, they’re also working on the generative part so that the model can give the richest, best response to the user.

Suggested Read: Top 5 methods to Evaluate RAG models

Next Shift: From Compound AI systems to AI Agents

Imagine you want to bring in a very different query to the ‘customer support’ example. Let's ask about the weather in the model.

The model is going to fail. Why? Because the path that this program has to follow is to always search the customer database. And that has nothing to do with the weather.

When we say the path to answer a query, we are talking about something called the control logic of a program. So, most of the compound AI systems have programmatic control logic.

One other way of controlling the logic of a compound AI system is to put an LLM in charge. This is only possible because we're seeing tremendous improvements in the capabilities of reasoning in large language models. This is where AI agents come in.

So, now you can feed LLMs complex problems and you can prompt them to break them down. Another way to think about it is, on one end of the spectrum, you’re telling the system to “think fast, act as programmed”, and not deviate from the instructions given to you. And on the other end of the spectrum, you're designing your system to think slowly.

What are AI agents

AI agents go beyond simply responding to queries (prompts). They are designed to perceive their environment, plan actions, execute them, and learn from experiences.

They operate with a degree of autonomy, making decisions and adapting to changing circumstances. Agentic AI can:

Plan: Break down complex tasks into manageable steps.

Reason: Analyze information, identify gaps, and draw logical conclusions.
Act: Execute actions, interact with systems, and adapt to changing circumstances.
Learn: Refine their strategies based on feedback and experience.

A unique feature of an AI agent lies in its ability to have a large language model in charge of the control logic. This shift is made possible by the tremendous advancements in the reasoning capabilities of modern LLMs.

Instead of following a rigid, pre-programmed path, an AI agent leverages the LLM to understand complex problems, break them down into smaller steps, and devise a dynamic plan to tackle them.

Think of it as moving from a system that is told to "think fast, act as programmed, and not deviate" to one designed to "think slowly, create a plan, attack each part, identify roadblocks, and readjust the plan if necessary". This "thinking slowly" approach allows agents to handle much more intricate and varied tasks.

How Do AI Agents Work: The ReACT Framework

One popular method for configuring AI agents is the ReACT framework. It combines Reasoning and Acting.

In this approach, a user query is fed into an LLM with instructions to think step-by-step and plan its actions. The agent can then decide to act by utilizing external tools to gather information or perform specific operations.

After an action, the agent observes the result, determining if it brings it closer to the final answer or if the plan needs to be revised. This iterative process of reasoning, acting, and observing continues until a satisfactory solution is reached.

Let’s go back to our customer support example. Say, you put in this query - "I want to know a particular customer's order status and see what the weather would be on the day of delivery".

When configured with the ReACT framework, the agent would follow an iterative process of Reasoning, Acting, and Observing. Here’s a breakdown of the query:

Reasoning: The LLM at the core of the ReACT agent would first analyze the user's request and break it down into distinct goals. It would identify the need to:

Know the status of a specific customer's order. This requires identifying the customer and the order.
Find out the delivery date for that order.
Find the weather forecast for the location of the customer on the identified delivery date.
Present the order status and the weather forecast to the user. The prompt given to the LLM would instruct it to think step-by-step and plan its work rather than providing an immediate answer.

Acting (through Tools): Based on the reasoned plan, the agent would then decide which external tools it needs to utilize to gather the necessary information. These tools could include:

Customer Order Database Tool: This tool would allow the agent to query a database using customer identification. It helps to find the specific order and its current status, including the scheduled delivery date. The agent would generate a search query to retrieve this information.
Weather API Tool: Once the delivery date and the customer's location are identified, the agent will use a weather API. It would call this API with the location and the delivery date to get the weather forecast for that day.
Potentially a Clarification Tool: If the customer isn't identified in the initial query, the agent might use a tool to ask clarifying questions, such as "Could you please provide the customer's name or order number?"

Observing: After each action (tool call), the agent would observe the result.

If the database tool successfully retrieves the order status and delivery date, the agent will observe this information. If the query to the database fails (e.g., no such order found with the given information), the agent would observe this error and might need to revise its plan. Perhaps by asking the user to double-check the information.
If the weather API tool successfully returns the weather forecast for the specified location and date, the agent will observe this data. If the API call fails (e.g., due to an invalid location format), the agent would observe the error and might try to reformat the location or ask for clarification.
If a clarification tool were used, the observation would be the user's response, which would then feed back into the reasoning and acting phases.

This iterative process of reasoning, acting, and observing would continue. For example, after getting the order status and delivery date, the agent would proceed to use the weather API.

Once both pieces of information are successfully retrieved, the agent would then formulate a final response to the user, such as, "The order for [Customer Name/Order Number] is currently [Order Status] and is scheduled for delivery on [Delivery Date]. The weather forecast for [Delivery Location] on [Delivery Date] is [Weather Forecast]".

If at any point an action doesn't yield the expected result, the agent would re-evaluate its plan based on the observation and try a different approach or tool if available. This shows the agent's ability to "think slowly," identify where it needs external help and adjust its strategy based on the outcomes of its actions.

Compound AI system or an AI agent: Which one to choose?

Agentic AI models provide a certain level of autonomy to an LLM. This autonomy is not always necessary or beneficial. For instance, if a problem is narrow and well-defined, with predictable query types, a more direct and programmatic compound AI system might be more efficient. In such cases, a fully agentic model could introduce unnecessary steps and loops, hindering efficiency.

However, when dealing with complex tasks that could involve a broad range of potential queries, the agentic approach becomes crucial. This is because the alternative – attempting to programmatically configure every possible query path – would be incredibly difficult, if not impossible.

Agentic models, with their ability to adapt and respond to novel queries, are far better suited to navigate such complex and unpredictable problem spaces.

Therefore, the decision of whether or not to employ agentic capabilities should be based on a careful consideration of the task at hand. If the task is straightforward and the query types are predictable, a more programmatic approach might be preferable.

But if the task is complex and the query types are varied and unpredictable, an agentic model is likely the best tool for the job.

Why Should Business Leaders Care About AI Agents

AI agents are a big step forward in how we use AI. They're not just answering questions; they're using tools and learning from their experiences to solve more complex problems on their own.

Embracing AI agents is a total game-changer. In the fast-paced world of AI, businesses that jump on this opportunity will definitely have a leg up on the competition. Integrating AI agents into how you work can seriously optimize workflows, streamline decision-making, and spark innovation.

Staying ahead of the curve means not just exploring AI agent tech, but actually putting it to work. The potential benefits are huge: think improved efficiency, data-driven insights, and better customer experiences – you don't want to miss out on that.

Businesses that sleep on the transformative power of AI agents risk falling behind and missing out on major growth opportunities.

And, if you think you need support in adopting AI to automate tasks, just reach out to us at Zams. Check out these success stories of how we have helped other leaders like you with AI adoption.

‍Book a demo today and let’s be partners in your AI journey.

A Detailed Guide To Understand How AI Agents Work