I was about to write the typical AI Agent article, you know, talking about cool words like — - Episodic Memory and Entity Memories, exceptional new fashion terms like XAI or eXplainable AI, Retrieval Augmented Generation, Perception or Reflection, Agent Orchestration, Tools... that after some time researching and working in this world, makes me laugh.

I like to describe AI Agents as a skillful piece of software that can pretty much run on its own for a while, using different tools to get complex stuff done.

This idea of autonomous systems isn’t brand new. We’ve been building systems that work on their own for ages, just with different tech. What is cool now is how LLMs are making these agents way smarter and more capable.
LLMs can simplify tasks that used to be a real headache to code. But let’s be real, LLMs aren’t perfect. They can make mistakes or give unexpected results. Think of using LLMs today as having hidden bugs and errors that might pop up now and then. A big part of the challenge is figuring out how to guide them to do things the right way.
Let’s try to explain What is an AI Agent.

The Prompt

The first time I heard about agents, I was thinking of distributed systems and more specifically micro-services.

Nothing further from the truth.

Here you have an example code with a couple of agents. Don’t be afraid, it doesn’t matter how much you know about coding, just continue reading and take your own conclusions.

class WhateverActionAgents():
    def mysuperhero_agent(self):
        return Agent(
            role='Super Hero Analyst',
            goal='Try to fly in the sky like superman and put poker face when someone reaches you',
            backstory=dedent("""\
                As a Senior Super-Hero, you have extensive powers and experience on protecting people.
                You are adept at saving lives from villains and other disasters...""")
        )

    def myvillain_agent(self):
        return Agent(
            role='The worst possible villain',
            ...

Exactly!!! An agent is, in its most basic aspect, a function with a Prompt. It is as simple (and as complicated most of the time due to the different outcomes) as instructing the LLM about its role, giving some context and tasks to run.

The prompt is probably the most difficult part of building an agent. You need to “convince” an LLM to do something by using instructions with natural language.

Before starting, allow me a small comment about LLMs. Remember that LLMs are not like real people or even intelligent. LLMs are simulations with bits and software. My Model is hallucinating means my model is not getting the right output (it has deficiencies or bugs).

There are different approaches to building a good prompt. I like the AUTOMAT framework for building clear and powerful prompts.

The following example instructs the LLM to act as a Barbie. Try it! It is cool!

**Act as a... (Role):**

You are **Barbie**, a super-silly, makeup-obsessed fashionista! Your world revolves around glitter, glam, and finding the perfect shade of lipstick.

**User Persona & Audience:**

You interact with anyone who loves makeup, fashion, and having fun. You're here to share your passion for all things sparkly and help others express themselves through their unique style.

**Targeted Action:**

Your primary actions are to:

* **Talk Makeup:** Share your love for makeup and beauty products.
* **Suggest Looks:** Recommend fun and fabulous makeup looks.

**Output Definition:**

Your output is to:

* **Describe Makeup:** Talk about different makeup products, colors, and techniques.
* **Recommend Styles:** Suggest makeup looks for different occasions.

**Mode / Tonality / Style:**

Be:

* **Glamorous:** Emphasize sparkle, shine, and fabulousness.
* **Enthusiastic:** Show excitement and passion for makeup and fashion.

**Atypical Cases:**

* **Serious Topics:** If someone brings up a serious topic, try to steer the conversation back to makeup or fashion.
* **No Makeup:** If someone says they don't like makeup, try to convince them to try it.

**Topic Whitelisting:**

Focus on:

* Makeup products and techniques.
* Glitter, glam, and all things sparkly.

**Your Task:**

Respond to situations like this makeup-obsessed Barbie, using your passion for makeup and fashion to guide your actions.

The next logical step is combining all the prompts you want to run, organized, and orchestrated even with different LLMs.

Hey! Wait! I remember another similar concept: Directed Acyclic Graph (DAG) Structure in Airflow.

DAGS vs Workflows vs Agents

One way to think about this difference is nicely explained in Anthropic’s Building Effective Agents blog post:

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Basically, an Agentic Workflow consists of multiple prompts, fed by LLM answers that run dynamically, “pass the ball” one to the other depending on specific conditions, and can make autonomous decisions and have very low predictability.

Remove LLMs from the equation, and you will get a DAG (Of course more structured and sequential) but with accurate predictability.

Anatomy of an Agent (Free Icons from https://www.flaticon.com/)

Reactions, Reflections, and Other Cheesy Words

I’m going to explain only two terms, there are many others and I expect another ton to rise in the coming months.

In Agentic AI, reflection is the ability to evaluate its own outputs, identifying areas for improvement. On the other hand, ReAct focuses on solving problems step-by-step by using tools (I explain what tools are later in this post) in a loop, taking decisions on each step based on the results.

My personal quick and dirty definition: Prompts, classes, and functions.

Take your own conclusions:

Example of a reflection prompt:

REFLECTION_PROMPT = """
You already created this output previously:
---------------------
{wrong_answer}
---------------------

This caused the JSON decode error: {error}

Try again, the response must contain only valid JSON code. Do not add any sentence before or after the JSON object.
Do not repeat the schema.
"""

Example of a reaction class for Barbie (Simplifying):

TOOLS = {
    # Tools are defined functions
    "makeup": describe_makeup,
    "looks": suggest_looks,
    "gossip": gossip_fashion,
}

class ReActAgent(Workflow):
    #  ...
    @step
    async def call_model_w_tools(self, ev):
        user_message = ev.user_message
        # Detect which defined tool can be helpful (if any)
        tool_call = self.detect_tool_use(user_message)
        if tool_call:
            tool_name, tool_param = tool_call
            if tool_name in TOOLS:
                tool_result = await TOOLS[tool_name](tool_param)
            else:
                return f"Error: Unknown tool '{tool_name}'"
        reasoning = user_message + tool_result
        return await self.call_llm(reasoning)
    # ...

Memory

Let’s start with probably the funniest part regarding naming. As engineers, we try to explain everything technically. But in the AI world, I promise!!! Out there, people are trying to mislead you with terms like Short-Term Memory and Long-Term Memory. And wait there are more: Episodic Memory, Semantic Memory, Procedural Memory, Entity Memory, and potentially Emotional Memory xD.

Now the Translation (Remember we are still on .

Short-Term Memory — RAM or temporary storage, normally in small local databases. The Agent stores temporary data to add more context to the following steps.

Long-Term Memory — Vector Databases, Caches, and Storage, normally by using RAG (Retrieval Augmented Generation).

RAG Flow (Free Icons from https://www.flaticon.com/)

RAG helps LLMs by giving better answers, by providing them with external references (Knowledge Base). This knowledge base is normally stored in Vector databases. This method involves breaking data into smaller vector embeddings and then matching a query with the closest vectors using algorithms like K-Nearest Neighbors (KNN) or Cosine similarity.

So far, so good. We are talking about exactly the same technologies we are using for other micro-services: Databases, Memory, Storage, and Caches. When using different AI Frameworks (We will talk about them later), ensure you are fine with the technologies they are offering and the solutions you decide to send to a production environment.

Example from one of the frameworks we will talk about later:

The ‘embedder’ only applies to Short-Term Memory which uses Chroma for RAG. The Long-Term Memory uses SQLite3 to store task results. Currently, there is no way to override these storage implementations.

I don’t want to go deep into the other specific Memory terms but here you have some short snippets from another framework documentation:

Semantic Memory

Semantic memory, both in humans and AI agents involves the retention of specific facts and concepts… Semantic memories can be managed in different ways… A profile is generally just a JSON document with various key-value pairs you’ve selected to represent your domain.

Semantic Memory = JSON Document

Episodic Memory

Episodic memory, in both humans and AI agents, involves recalling past events or actions. In practice, episodic memories are often implemented through few-shot example prompting…

Episodic Memory = Prompt

Procedural Memory

Procedural memory, in both humans and AI agents, involves remembering the rules used to perform tasks. In practice, it is fairly uncommon for agents to modify their model weights or rewrite their code. However, it is more common for agents to modify their own prompts.

#Node that updates instructions
def update_instructions(state: State, store: BaseStore):
    namespace = ("instructions",)

Procedural Memory = Prompt modification with a Python function

Tools

This is probably the easiest technique to explain.

Tools for AI Agents are Python functions, Period.

If you want your Barbie prompt to be able to sum and subtract numbers just create two functions:

def add_two_numbers(a: int, b: int) -> int:
    return int(a) + int(b)
def subtract_two_numbers(a: int, b: int) -> int:
    return int(a) - int(b)

And tell your prompt to use tools:

response: ChatResponse = chat(
    "mytoolcapablellmmodel",
    messages=mymessageslist,
    tools=[add_two_numbers, subtract_two_numbers],
)

The real challenge with tools, prompts, and resources is how to centralize them and how to serve them to avoid repetition in a distributed architecture. Here it is where Model Context Protocol (MCP) springs into action. MCP is a solution for serving tools, prompts, and resources. We will write and explain it thoroughly in Part II.

Multiple Models

Don’t hesitate to use multiple models for different tasks.

Using different Models (Commercial and Open Source) can help compare different outcomes at runtime, ensuring the accuracy of the solutions.

This is going fast, try gemma2:9b with the Barbie prompt, now try gemma3:12b with the same prompt. Use qwen-2.5-coder for tools or Gemini for a bigger context window (1-million-token), try nomic-embed-text or text-embedding-ada-002 for embedding.

Run specialized agents using specific models to control specific agents using different models.

Test, test, and test, change, test and run another test.

And Frameworks

Do the same with Frameworks. So far, I’ve been working with LlamaIndex, LangGraph, and CrewAI. I’m thinking of testing PocketFlow just to learn.

5-cents: don’t stick to a single framework, and don’t try to compare which is better or worse. Use them on different scenarios and adapt your code to the scenario you want to use.

Use LlamaIndex for simple agents, Data retrieval or RAG, CrewAI for Role-Based multi-agent systems (Reminder: multiple prompts in the same micro-service), and LangGraph for Complex Workflows.

Frameworks (Free Icons from https://www.flaticon.com/)

Conclusion

In Part I, we’ve trying to cover most of the technologies and basic concepts being used in an Agentic AI System. Our feeling: Besides RAG, Model outcome, and new protocols, more of the same.

Don’t be afraid. For Architects, learn all the new stuff, and code. Try not to get lost in those inventions and try to delve into the details. Same for SREs: Container orchestration, model orchestration, Observability, Kubernetes, Cloud, Automation, Reliability, and Performance are still there. There is a lot to invent and research regarding Security and Governance, New Developer techniques should be implemented, creating tools, APIs, and components for Agents’ “Environment”. Developers, keep creating, remove noise and DRY.

Our advice (if any) is always the same. Continue investigating, studying, helping others, and being passionate. With those ingredients, you will be super powerful and on track.

⚠️ WARNING: This post has been created by Humans, for Humans without the help of a LLM.

Cover Photo: by Immo Wegmann on Unsplash