I’ve been building on top of MCP for a while now as multi-LLM orchestration pipelines, agentic systems, RAG enabled backends. So when Perplexity’s CTO Denis Yarats said they’re moving away from MCP internally in favor of REST APIs and CLIs, I didn’t panic. I kind of nodded.

Then LinkedIn went absolutely wild with the hot takes.

“MCP is dead.” “Anthropic blew it.” “It was always a bad idea.”

The discourse is loud and most of it is wrong, or at least imprecise. Let me try to actually explain what’s happening here, because I’ve lived through the problems Yarats is talking about.


What Perplexity Actually Said #

At their developer conference this week, Yarats announced that Perplexity is internally moving away from MCP toward direct REST APIs and CLIs. He called out two specific problems:

  • Context window bloat
  • Authentication friction

The timing is, objectively, hilarious. Perplexity still has an official MCP server on their docs page. One-click install for Cursor, VS Code, Claude Desktop. They shipped it in November 2025 with tools for search, research, and reasoning. Their marketing team was promoting it while their engineers were apparently hitting walls.

That gap between what they shipped to developers and what they actually run in production is what got people riled up. And honestly, it should. It tells you something real about where MCP stands right now.

But here’s what didn’t happen: Yarats didn’t say MCP is garbage and everyone building on it is wrong. He said it doesn’t work for their specific internal use case at production scale. That’s a much more narrow claim.


The Problems Are Real Though #

I want to be honest here because I’ve hit all three of these myself.

The Token Tax #

Every time your agent connects to an MCP server, it needs to load the full list of tools into the context window. Tool names, parameter schemas, descriptions, return formats. All of it. Before the user has typed a single message.

If you’re running a single MCP server with five tools, this is fine. If you’re running a real agent that talks to six different services, it adds up fast. You blow through tokens before doing any actual work.

The Cloudflare team published something sobering about this. Their own API surface, if expressed as MCP tools with full schemas, comes out to around 244,000 tokens in context. Their full OpenAPI spec is two million tokens. That’s not theoretical overhead. That’s real money burning every single request.

And in agentic systems, this overhead multiplies. Every tool call round-trips through the model. The output of call one feeds back into the LLM just to get passed as input to call two. You’re not building a pipeline, you’re building a game of telephone where the model is doing the relaying.

No Composition Between Calls #

This one bothers me the most because it’s architectural, not just a performance tuning problem.

In the traditional MCP tool-calling pattern, you cannot loop. You cannot store intermediate state between calls at the code level. If you need to create thirty calendar events (a common demo Cloudflare actually showed), the tool-calling agent fires thirty individual requests, each going through the model. The model processes the entire growing conversation history every time.

The Code Mode agent at Cloudflare wrote a for loop and ran it. Same outcome, a fraction of the cost.

When your agent needs to string together multiple tool calls, there are no variables, no error handling, no flow control at the execution layer. The LLM is being used as glue between calls that really shouldn’t need it.

Auth Is Kind of a Mess #

This is the one that security teams and enterprise people are going to care most about.

MCP was designed for functionality first. Auth was largely left as an exercise for each server to figure out. The result is that across more than 500 scanned MCP servers in the wild, 38% have no authentication at all. The spec puts session IDs in URLs, which logs them everywhere and makes session hijacking much easier. There’s no message signing, no integrity verification built into the protocol.

Real incidents have already happened. Asana had an MCP-related privacy breach last year where customer data was leaking between instances. A critical RCE vulnerability in the popular mcp-remote npm package (used to add OAuth support) hit a CVSS score of 9.6. Microsoft patched an SSRF vulnerability in Azure MCP Server Tools just this week.

The spec hasn’t been updated since November 2025. The security model as-written isn’t close to what enterprises need. If you’re building anything where multiple services are connected through a single protocol layer, and someone compromises that layer, they potentially get tokens for everything connected to it.

I’ve heard people say these are “theoretical” risks. They’re not. They’re hitting production.


What People Are Proposing Instead #

Just Use APIs and CLIs #

Perplexity’s answer is the boring one: go back to direct REST API calls and CLI tooling.

It sounds like a step backward, but think about what you actually get. Standard HTTP metrics, tracing, logging, circuit breakers. Auth you control end-to-end. No protocol translation layer. No token overhead from tool schemas.

CLIs specifically are getting renewed attention for agentic automation. They’re sandboxable. They have clear inputs and outputs. They work in local environments, CI pipelines, and production runners. A lot of what agents are doing in practice is just orchestrating existing developer tooling. Wrapping that in a protocol that wasn’t designed for production security doesn’t add value, it adds risk.

This is basically Perplexity’s take: for their use case, the abstraction wasn’t earning its keep.

Code Mode (This Is the Interesting One) #

Cloudflare published something earlier this year that I think is genuinely worth reading if you build in this space.

Their insight is that the problem isn’t MCP itself. The problem is the mechanism of “direct tool calling” that MCP relies on. When an LLM calls a tool, it’s emitting special tokens it was trained on synthetic data to produce. It has seen very few real-world examples of MCP tool calling in its training data, so it’s not that good at it when things get complex.

But LLMs are extremely good at writing code. They’ve been trained on enormous amounts of real-world TypeScript, Python, Go. So Cloudflare’s approach was: instead of asking the model to call tools directly, convert the MCP server’s tool definitions into a TypeScript API and ask the model to write code that calls it. Then execute that code in a sandboxed Worker.

To make this concrete, say you have a Slack MCP server. In traditional tool calling, the agent gets loaded with something like this in context (as JSON):

Tool: send_message
Description: Sends a message to a Slack channel
Parameters:
  - channel: string (required)
  - text: string (required)
 
Tool: list_channels
Description: Lists all channels in the workspace
Parameters:
  - limit: number (optional)

Now ask the agent to notify three channels. It makes three separate send_message calls, each one going back through the LLM. Three round trips. Growing context each time.

Code Mode instead gives the model a compact TypeScript type definition:

interface SlackAPI {
  send_message(params: { channel: string; text: string }): Promise<void>;
  list_channels(params?: { limit?: number }): Promise<string[]>;
}
declare const slack: SlackAPI;

And a single execute_code tool. The model writes this:

const channels = ["#eng", "#backend", "#alerts"];
await Promise.all(
  channels.map(channel =>
    slack.send_message({ channel, text: "Deployment complete. v2.4.1 is live." })
  )
);

One round trip. Parallel execution. The model wrote a plan and ran it, instead of being the relay between each call.

The performance results are not marginal. For a simple task, Code Mode used 32% fewer tokens. For a complex batch operation (creating a bunch of recurring events), it used 81% fewer tokens. The model wrote a loop. The tool-calling agent made thirty individual calls.

With two tools (search and execute), Cloudflare can expose their entire API of 2,500+ endpoints to an agent context that costs around 1,000 tokens. When they add new products, the same code paths work without new tool definitions.

Anthropic independently arrived at the same pattern. So did Apple. They’re calling it Code Mode across the board.

It’s not a replacement for MCP as a discovery and connection standard. It’s a smarter execution pattern on top of it.

UTCP (Early but Worth Watching) #

There’s a community project called Universal Tool Calling Protocol building on the Code Mode pattern. It’s a library that lets agents call both MCP tools and UTCP tools via code execution rather than direct tool calls. Early stage, but the direction is right. People are already trying to build the next-generation standard that takes MCP’s good ideas (uniform connection interface, discoverable tools) and fixes the execution layer.


My Actual Take After Building With This #

MCP is not dead. That’s an overcorrection from people who want a clean narrative.

But the “MCP is the USB-C of AI” framing that went around last year oversold it. It was always a protocol designed for tool discovery and standardization. It was never designed to be a production-grade security and performance layer at scale. Those are different problems.

The real situation is more like: MCP solved the right problem (how do AI agents find and connect to external tools in a standard way) but the execution model it ships with has genuine limitations at production scale. The context overhead is real. The auth story is immature. The sequencing limitations are architectural.

For use cases where the tool set is known and fixed, direct APIs are simpler and cheaper. Perplexity knows exactly what tools their agents need. They don’t need dynamic discovery. For them, MCP was adding a layer of abstraction without returning value.

For systems that need to adapt to new tools without code changes, or where the ecosystem of tools is broad and changing, MCP’s standardization layer is still genuinely useful. Claude Desktop, Cursor, VS Code integrations, development tooling, personal automation. These work well with MCP because the constraints that kill it in production (context overhead, auth complexity) are much more manageable at that scale.

What I’d actually recommend: if you’re building agents right now, understand the tradeoffs clearly rather than picking a side in the LinkedIn debate.

If your tool set is well-defined and stable, call the API directly. You’ll be happier with the observability and the security posture.

If you need MCP for discovery and ecosystem compatibility, look seriously at Code Mode patterns to manage context costs. Don’t just dump all your tool schemas into context and hope for the best.

And regardless of approach: treat any MCP server you run in production like a privileged API gateway. Not lightweight middleware. Not a toy integration layer. A privileged gateway with the same security rigor you’d apply to anything that holds credentials for multiple downstream services.

The real story here isn’t that MCP is dead. It’s that the agentic AI ecosystem is doing what every maturing infrastructure ecosystem does. The happy-path demos worked. Now the production constraints are showing up, and people are starting to separate the good ideas from the implementations that don’t scale.

That’s not a failure. That’s just what engineering actually looks like.