What I learned building my first AI Agent – Part 1

The first AI Agent that blew my mind was Cursor. When it started autocompleting my code for me, I felt like I was on one of those cartoons. It was so good. I decided I wanted to dig deeper on this AI Agent thing and learn the maximum I could about it.

During my research I have found so many options to start building up agents:

No-code platforms
Low-code platforms
SDKs
Python Packages
Tools from well stablished cloud providers

After a lot of thought I decided to give it a try with a python framework called Langchain. The decision was made specially because most of the tutorials and content I have found used Langchain. So I though: if everyone is using them, they must be good.

After some time digging with Langchain, I learned my first lesson:

The most popular framework is not necessarily the best one

This lesson is valid not only to AI Agents, but in general. I was so disappointed with Langchain that I made a post on one of the communities asking what I was doing wrong. I felt like everyone was using and loving it but I was the only one who couldn’t get anything done. For me it was absurdly over-engineered, over-abstracted and confusing.

Now imagine my surprise when in the comments so many were migrating from it, stating that it was not production ready and instead should be used as a rapid prototype option.

I felt like I was trapped into a bubble. I believed everyone was using it but in the end it was just something people where writing insane amount of content about ( trying to sell courses or improving SEO for something else ). Almost no one was actually using in production. This realization led me to reconsider my approach entirely.

Dropping Langchain

Ok, lesson learned and back to the start. In the past few years I haven been working a lot with Node.js and React. Everyting I was reading about building AI Agents with coding was using Python. I know Python is one of the biggest programming languages and that it dominates the AI landscape, but I really wanted something that I could use with Typescript.

Further research uncovered a less-discussed option: Vercel’s AI SDK.

It was perfect for me. I know Next.js, It works with Typescript, the documentation is decent, it has good tutorials. The only problem is that it seemed very recent and still gaining traction. The community was small, there wasn’t many people using it in prod. But after some research I decided to give it a try.

I went on with the first tutorials and quickly was able to create my first Chatbot. It worked perfectly. It was the basic use-case, but after my experience with Langchain I considered it a win.

Now I decided to advance with something more elaborate. With a promising new toolkit in hand, the next step was figuring out what problem to solve.

Evaluating use-cases for AI Agents

With so many AI-powered tools and agents popping up it seems that AI is aimed to solve all problems for the humanity. But the truth is, AI agents are just another interface for the user. The most important thing is solving a problem and being easy to use and understand. When I want to build a solution I always start with the problem, digging into it, talking to people. And later I evaluate what kind of solution and tech should be used. In this case, for learning purposes, I had to start with the solution.

What kind of problem I want my AI Agent to solve? Should I make a conversational AI Agent ( like a chatbot ) or a copilot already integrated with other tools that performs actions for the user ( like Cursor )? For simplicity I decided to go with the conversational approach, since there would be so much extra work to turn it into a plugin capable of integrating with other tools.

The final choice

One use case I like a lot is AIs that help you find relevant punctual pieces of information without the need of going through huge blocks of tutorials or documentation. I remember when I had to add a new SSH for my Github account on my Mac, the tutorials I found on google where insane, people would make a 300 word post for ranking higher on SEO when all it needed was a few lines of bash code.

Because of that, I decided to give a try on a conversational AI Agent that could help users find information on Figma‘s documentation. The main reasons for this decision was:

My girlfriend started to get interested in Design and was learning Figma to create some user interfaces, so this could help her
I use Figma a lot at work because I am the product engineer responsible for the front-end and I need to coordinate with the design team the aspects of usability and the components we use on our app, and we use Figma for this, so this could also help me

Ok, now that I have a clear scope I had to find a way to allow my agent to search for the information and present it in a helpful way.

The quality of your data

The value of an AI Agent can be determined by either the capacity of performing tasks successfully or to provide the right data for the user. In this case, since I was working with Figma documentation, I had to find a way to allow my agent to access the documentation. Since Figma is a tool for designers, it’s also important to include images and GIFs, but I will go on with this later.

I decided to start with a simple AI Agent capable of answering textual data. The Figma documentation is quite extensive and composed by more than one hundred web pages. My first challenge was to find a way to scrap and store all this data in a way that would be easy for my Agent to access it. I also decided to store the data on a Markdown format because it is how I wanted to present the answers.

After some research I was able to find the perfect tool to scrap all the pages of documentation on Markdown format: Firecrawl. There are many tools available, but based on my requirements and personal experience I have found this one the best fit. For each webpage I have saved the whole content on Markdown format on my database. This was useful because I still had to decide which strategy to retrieve content, and having the content already available on a database would save me the time if I decide to try and implement multiple approaches.

RAG or Context

My research pointed towards two main approaches for feeding the documentation to the agent:

Large Context Window: Use an LLM capable of handling a massive context and load the entire Figma documentation into the system prompt for every query.
Retrieval-Augmented Generation (RAG): Retrieve only the most relevant snippets or pages of the documentation based on the user’s query and provide only those to the LLM.

Given that LLM usage is often priced per input and output token, the RAG approach seemed significantly more cost-efficient. Feeding the entire, extensive Figma documentation into every prompt would likely become very expensive, especially at scale. RAG promised to keep costs down by focusing only on relevant information.

Handling complexity on the data

The problem with the documentation of Figma is that sometimes the whole context of a part of the documentation is very important. There are cases where the user asks for questions that may involve diverse features of the app, so in order to create a good answer, the content of many pages and tutorials should be analysed. Since I wanted to put a free version online, I had to choose an LLM model cheaper that had some limitations on the context-window.

Thus, I had to create a logic in which the RAG would evaluate the pages that had the most important data for the answer and include the entire pages. The other sections that were not ranked as much important I would only add the snippets containing relevant information. This way I was able to keep my token constrains and also guarantee that the agent would have the entire context for the information.

The simplest AI Agent I could put online

After refining the user interface and doing some tests, I had a minimal functional AI agent capable of answering questions about Figma features . Since I was using Next.js, I decided to host my app on Vercel, since it was the platform that provided me the easiest and most intuitive way to do it. I was very happy with the result, even though the application was simple, in just a few days I managed to learn about RAG in practice and was able to create a minimal ai-based application capable of providing value to it’s users ( in this case me and my girlfriend ) .

I still had to find a way to add images and gifs to the answers, since just a textual response for a design application wasn’t so much helpful. In the next part of this post I will dig deeper on how I later added more functionalities to this AI agent and handled the multimodal feature.

If you wanna try for FREE the last version of this app, you can access it on this link.

Thanks for reading the full post and I hope you had a pleasant reading.

* The original post was featured on my personal blog, where I often write about AI, development and tech. You can check it out on this link.

What I learned building my first AI Agent – Part 1