Why a Single MCP Server is All You May Need

In my two decades of career, I never saw a hype which has been created around MCP (Model Context Protocol) introduced by Anthropic sometime back. While the introduction of MCP has generated significant buzz in the tech industry, promising to revolutionize AI interactions with tools and APIs, it falls short in addressing several critical issues. Despite its potential to streamline communication between AI agents and large language models (LLMs), MCP presents challenges that cannot be overlooked. These include inefficiencies in token management, complexities in authentication and authorization, and limitations in search capabilities. In this article, we critically examine the concept of MCP, its implementation, and the practical challenges faced during its adoption, particularly through the lens of MechCloud's REST API Agent. Our analysis aims to shed light on why a single MCP server might be all you may need, and why the current hype may not fully align with the reality of its application.

Understanding the concept of function/tool calling

The concept of function calling is quite simple. You provide a prompt and specifications for a set of tools in JSON format to an LLM. It returns a tool and the corresponding input parameters that match the prompt, which you then use to invoke the tool. Once you receive the response, you pass the same prompt along with the response from the tool back to the LLM, and it returns the final response.

MCP Clients communicate with LLMs using REST APIs

Diagram 1

I find it quite surprising that MCP doesn't require communication between an MCP client (like an AI agent) and an LLM API to go through an MCP server. In my view, MCP servers for LLM REST APIs should have been the first thing implemented by LLM providers. Currently, an MCP client communicates with an LLM using its REST API. However, according to the MCP standard, it should use an MCP server for other systems or REST APIs, as shown in the diagram above.

MechCloud REST API Agent

From this point on, I will tell this story in the context of the MechCloud REST API Agent. This will help you understand the challenges we faced while developing this agent and why these challenges can be obstacles for MCP servers.

MechCloud supports integration with various systems (e.g., Kubernetes, MongoDB) due to its internal needs or because it serves as a UI-based alternative to any Infrastructure as Code (IaC) tool (e.g., Terraform). Therefore, we need to integrate with hyperscalers (Cloudflare, AWS, Azure, GCP) and any other system that can be managed with an IaC tool.

While many companies behind various IaC tools rely on hyperscalers and other players to create and maintain providers needed to integrate hyperscaler and other system APIs into their IaC tools, we found this challenging. We are not Anthropic, which developed MCP and is now embraced by major software companies. So, our only option was to solve this ourselves. However, writing and maintaining integration code for all the hyperscalers and other systems, whose APIs frequently change, was a huge task. This led us to create a generic REST API agent that can communicate with any REST API with minimal integration code and almost no maintenance. After implementing this agent, we removed all our Python-based integration code that we had written for various hyperscalers and other systems. Now, this integration is managed using a configuration-based approach, where a new REST API can be integrated simply by adding the URL of its OpenAPI specification.

As described above, the first step in function calling is to match the prompt with a tool from a list of tools and get a structured response for its input parameters. This may seem simple in examples you might have seen online, but it is actually a challenging task. Consider the following scenario where MechCloud is integrated with several REST APIs:

Diagram 2

With the function calling approach, when a user sends a prompt (e.g., create a network in GCP, run a container in Docker, etc.) to the MechCloud REST API Agent, it retrieves a list of all tools from these integrations and submits it to the target LLM API as part of the function/tool calling process. If there are only a few tools defined across these integrations, there is no problem. However, if we include all endpoints from all these integrations, two major challenges arise:

A lot of tokens will be wasted just to match the prompt with a set of tools, even when it's clear that the prompt belongs to only one of these integrations.
It's likely that the combined tool specifications for all these integrations will not fit within the context window of many LLMs.

Reducing tokens wastage during function/tool calling

We can create one REST API agent per integration to reduce tokens wastage for the first challenge mentioned above and try to fit the tool specifications within the context window of a target LLM. In our case, we still have only one REST API agent, which requires selecting an integration (e.g., Cloudflare) and an account to communicate with the third-party REST API for this integration.

The latter approach may or may not work depending on the target integration. For example, the Cloudflare REST API OpenAPI specification uses about 1.8 million tokens, which can currently be handled only by Gemini Pro among the mainstream LLMs I am aware of. So, I'm unsure how it will work if you integrate the Cloudflare MCP server, with full support for all the operations defined in the OpenAPI specification of Cloudflare REST API, with the Claude desktop app or any other MCP Host/Client. Tokens wastage during function/tool calling is not even being discussed amidst all the MCP hype that has been created.

Fortunately, some LLMs or inference providers offer support for prompt/context caching, which can help reduce costs for each call to LLMs during function/tool calling. However, this comes with a price. The input token price for Gemini 1.5 Flash (note that 2.0 and 2.5 do not support caching at the time of writing) starts at $0.075 per million tokens without caching but drops to $0.01875 per million tokens with caching. You also need to pay extra for context caching storage, which starts at $1.00 per hour. This cost can be prohibitive for many startups unless they operate at a large scale. While this amount is minor for large software companies having REST APIs in front of their systems, unfortunately, cached context tokens cannot be shared across accounts. Otherwise, this could have reduced the cost of function/tool calling for the customers of these companies. So if you are posting same tools json for a MCP server during function/tool calling, I see no reason why you should do it instead of leveraging context/prompt caching unless caching is taken care by target LLM API automatically.

Please note that while context/prompt caching can save you some good amount of money on tokens consumed during function/tool calling, it has no impact on the latency. So when I posted an OpenAPI specification (around 100k tokens) for a REST API, Gemini 1.5 Flash took 30+ seconds both with and without caching. Another point to note is that input tokens count MUST be at least 32,768 in order to leverage context/prompt caching in Gemini.

We solved this problem by using a different approach and not relying on function calling for our REST API agent. This agent is the only one managing the integration of MechCloud with all the third party REST APIs shown in the diagram above, instead of having one agent per integration. Our approach also significantly reduced tokens wastage, which can sometimes be as low as 1-2% of the tokens needed during function/tool calling for the same use case.

Tokens wastage during function/tool calling is something you should watch out for when using an MCP Host/Client with multiple MCP servers, as it can lead to an unexpected large LLM API bill. It's like getting a private jet for free but not considering the pilot's salary, maintenance costs, and other expenses.

Authentication and Authorization

If you look at the diagram above, you'll see that for MechCloud, we use a proxy called MechCloud OIDC proxy. This proxy runs in the customer's network and allows MechCloud to communicate with any internal customer or third-party REST API. This setup means we don't need to store any customer API keys or service accounts for these REST APIs in MechCloud, eliminating the major security risk associated with long-term credentials. We simply pass a JWT token from MechCloud backend to this OIDC proxy to integrate with a REST API.

Many REST APIs are multi-tenant, so when you create a service account or API key for such an API and use it with an MCP server, it turns that server into a single-tenant server. For example, if you are a service provider and need to manage multiple Cloudflare accounts for different customers, you will need to run a separate Cloudflare MCP server for each of these accounts.

If an MCP server needs to be used by different users with varying permissions based on their roles, this presents a challenge. The only apparent solution is to replicate the authorization logic from the target system into the MCP server, which can be quite complex, especially with some multi-tenant REST APIs. For example, if you're using an MCP server to connect to GCP, and different users (like developers and DevOps) need different permissions (such as developers only being able to reboot a VM, while DevOps can provision, reboot, and terminate a VM), you would ideally create a service account in GCP with the highest permissions required for all user types and use it for this integration. I don't see how this integration can work unless you duplicate a lot of the authorization logic between GCP and the GCP MCP server. If you don't replicate the authorization logic in the MCP server, then all user types will have the same permissions, which would be a security risk unacceptable to anyone.

Do we really need function calling for Public APIs?

Sometimes I wonder why we need function calling for Public APIs. I should be able to simply say provision a VM on GCP, and an LLM with built-in search support should provide a structured output for this request. It could do this by finding the latest OpenAPI schema for the endpoint used to provision a VM on GCP, without needing me to provide tool specifications for the various endpoints of the GCP REST API. It is not clear if MCP servers can solve this problem.

Search is a serious limitation of GenAI tools / LLMs

With a small team of 1.2 people (myself at 100% and Akash at 10-20%) working on MechCloud, it's challenging for us to keep our documentation up to date with the tutorials, blog posts, and demos we've created so far. I believe the solution is to have fully automated demos where an AI agent can demonstrate MechCloud's features to users, just like a human, using the latest UI and documentation. This ensures users aren't viewing outdated information. This is only possible if the search functionality in many LLM APIs can retrieve the latest instructions from our ever-updating MechCloud documentation for specific scenarios or use cases. I have yet to see an LLM that meets this basic requirement. I don't see how LLMs can effectively drive a desktop app without access to the latest instructions. While Google search keeps up by quickly indexing new content posted online, this doesn't seem to be the case with the search tools built into many LLMs. Just last month, I had to use Google search multiple times after failing to find solutions using the search features of various GenAI tools. Google search led me to the solutions, which were among the top three results on the first page. Not sure if MCP servers will be able to solve this issue.

Conclusion

MCP servers are quite useful if the MCP client is a GenAI tool desktop app (e.g., Claude Desktop) because you won't have to worry about charges exceeding the monthly subscription fees. However, this doesn't seem to be the case for other MCP clients (e.g., Cursor, Windsurf). Without this benefit for direct integration with LLM APIs, many users will likely choose LLM providers that offer their APIs on a prepaid basis, in addition to the pay-as-you-go model, which is common across all LLM providers.

It seems MCP servers are either trying to address issues that were resolved for REST APIs a long time ago or are not solving the pressing problems mentioned above for integration with REST APIs. Based on our experience developing this REST API Agent, we can say that you don't need more than one MCP server unless you have custom logic that varies between servers. In that case, MCP servers will become a shared responsibility where the provider of a REST API will develop a basic MCP server that end users will need to customize through code, unless specific requirements can be met using a configuration-based approach.

Understanding the concept of function/tool calling

MCP Clients communicate with LLMs using REST APIs

MechCloud REST API Agent

Reducing tokens wastage during function/tool calling

Authentication and Authorization

Do we really need function calling for Public APIs?

Search is a serious limitation of GenAI tools / LLMs

Conclusion

Comments (0)

Read More

#reading

#popular

Why a Single MCP Server is All You May Need

Understanding the concept of function/tool calling

MCP Clients communicate with LLMs using REST APIs

MechCloud REST API Agent

Reducing tokens wastage during function/tool calling

Authentication and Authorization

Do we really need function calling for Public APIs?

Search is a serious limitation of GenAI tools / LLMs

Conclusion

Comments (0)

Read More

GitHub MCP with Amazon Q CLI

Understanding the Model Context Protocol (MCP) and using it with Amazon Q Developer CLI

Exploring the MCP Ecosystem: Looking Under the Hood

Introduction to Model Context Protocol (MCP): The USB-C of AI Integrations

#reading

#popular