Valliance logo in black
Valliance logo in black

MCP vs CLI is the wrong question

Apr 27, 2026

·

5 Mins

AI Transparency

The debate about MCP versus CLI for agentic engineering has calcified into a tidy conclusion: use both, pick the right tool for the job, move on. The Firecrawl piece lands there. Most benchmark posts published in the last six months land there. It is a comfortable answer that avoids the harder one.

The harder one is that the question itself is miscategorised. MCP and CLI do not sit on the same axis. They occupy different layers of the agent stack, they exploit different properties of the underlying model, they solve different organisational problems, and treating them as interchangeable transports produces the same category error enterprise architecture kept making in the 2000s when it confused an ESB with an API.

This piece argues two things. CLIs beat MCP on developer-facing workflows today because the model weights already know them, and that advantage is structural. MCP's real value sits one layer up, in governance, identity, and access to SaaS surfaces with no shell. The confusion between those two layers explains why teams are burning context windows on infrastructure that belongs elsewhere.

The training-data asymmetry

The token economics get most of the attention. They are real. Scalekit's benchmark shows CLI agents completing tasks in 1,365 to 8,750 tokens with 100% success, while MCP equivalents consume 32,000 to 82,000 tokens at 72% reliability. Apideck reports three MCP servers eating 143,000 tokens of a 200,000-token context window before the agent reads its first user message. A poorly-shaped GitHub MCP server alone ships 93 tool definitions for roughly 55,000 tokens of schema.

That is the visible cost. The invisible cost is cognitive. When 70% of a context window is spent on tool schemas, the reasoning budget the model needs to compose a solution has been evicted. Reliability drops because the agent is trying to think inside the gaps left behind.

Token economics explain why MCP is expensive. They say nothing about why CLI is good. The deeper answer is that models have been trained on decades of shell scripts, git histories, Stack Overflow answers, Kubernetes manifests, jq pipelines, awk one-liners, Dockerfiles, and Makefiles. A well-formed gh pr list --json number,title | jq '.[] | select(.title | contains("bug"))' is a pattern the model has seen thousands of times. The schema is implicit in the weights. The grammar is recalled.

MCP tool definitions arrive cold. They are syntactic strangers injected at inference time, described in JSON Schema the model must parse on every turn, with no prior distribution over how they compose. The model learns the tool from a paragraph of description and a list of parameters, every request, forever. Compare that with kubectl, where verb-noun structure, flag conventions, output shape, and common error modes all sit in pretraining as prior knowledge.

The asymmetry is structural. CLIs evolve slowly. The Unix interface is fifty years old. git is twenty. gh is five. By the time a CLI matures, the training corpus has absorbed it thoroughly. MCP tool schemas change weekly, fork per vendor, and have no canonical idiom. The model meets them fresh on every request.

This means the CLI advantage for mature tooling will widen, not close. Each new generation of base model is better at git than the previous generation because the corpus grows. MCP schemas remain novel because the ecosystem keeps fragmenting. Betting on MCP to reach parity on developer workflows is betting against the gradient of the training data.

The category error

If CLI wins on developer workflows, the obvious conclusion is that MCP is a weaker protocol. That is also wrong. MCP solves problems CLI was never designed for, and the teams using it correctly are operating one layer up the stack.

Consider what actually belongs in an MCP server. Per-user OAuth scopes. Audit trails that tie every action back to a specific principal. Policy engines that gate which agents touch which records. Access to Salesforce, Workday, Greenhouse, or any SaaS surface that exposes an API without shipping a CLI. Stateful connections to systems where re-authenticating on every call is expensive. These are governance concerns. They have nothing to do with execution.

The CLI surface is execution. The developer running the shell is already trusted. The auth model is implicit in the user's session. The tools are local, composable, and cheap to invoke. MCP is the wrong shape here. Asking MCP to mediate git commit is like putting an API gateway in front of a local function call.

MCP is governance. It earns its keep where an agent acts on behalf of a specific principal against a regulated system of record, where every call needs attribution, and where the underlying service has no shell equivalent. Putting MCP in front of kubectl is a category error. Putting it in front of Salesforce record updates executed by a customer-facing agent is the design it was built for.

The taxonomy that works in practice has three planes. The execution plane is CLI territory: developer-facing agents, local tooling, code operations, infrastructure as code, anything with a mature shell interface. Use CLI here. Let the agent compose pipes. Accept the modest cold start of describing the tools and harvest the long tail of pretraining familiarity. The governance plane is MCP territory: customer-facing agents, multi-tenant SaaS, systems of record, regulated workflows, any surface requiring per-request identity and audit. Use MCP here. Accept the schema tax. Spend the tokens where they buy compliance. The knowledge plane sits above both: domain procedures, company-specific conventions, playbooks. No transport applies. These are instructions, and they belong in skills, prompts, or context files.

Confusing these planes produces the pathologies the current debate is cataloguing. MCP servers wrapping shell commands burn tokens for no governance benefit. CLIs bolted onto SaaS integrations leak credentials and lose audit trails. Skills written as MCP tools duplicate schemas that should have been markdown.

What this means in practice

Stop asking "MCP or CLI" and start asking three different questions.

Does this tool have a mature shell interface the base model has seen in training? If yes, prefer CLI. The token savings are a side effect. The real win is that the model is operating from prior knowledge, not parsing schema just-in-time.

Does this action cross a trust boundary that requires per-user identity, scope enforcement, or audit? If yes, prefer MCP. The tokens spent on schema are buying something no other transport provides.

Is this a procedure or convention dressed up as a tool? If yes, it belongs in a skill or prompt. Do not wrap it in a transport at all.

The hybrid architectures emerging at Google, Anthropic, and the more mature agent teams are separating concerns, not hedging bets. Google shipped gws alongside its MCP servers because developers want shell composability and IT administrators want governance, and the same protocol cannot do both well. Anthropic's code execution with MCP pattern is a quiet admission that eager schema loading was a design mistake; the fix is to treat MCP as an API behind a code interpreter, not a direct tool surface.

Token bloat will get better. Progressive disclosure, tool search, deferred loading, and code execution will chip away at it over the next year. The training-data asymmetry will not improve, because it sits in the corpus, not the protocol. Mature CLIs will keep winning developer workflows. MCP will keep winning governance workflows. The protocol question is a distraction from the architecture question, which is which layer of the agent stack each tool actually belongs in. Teams that get that right will ship faster, cheaper, and with cleaner audit trails. Teams that keep arguing about transports will keep paying for the wrong one.


AI Transparency

The debate about MCP versus CLI for agentic engineering has calcified into a tidy conclusion: use both, pick the right tool for the job, move on. The Firecrawl piece lands there. Most benchmark posts published in the last six months land there. It is a comfortable answer that avoids the harder one.

The harder one is that the question itself is miscategorised. MCP and CLI do not sit on the same axis. They occupy different layers of the agent stack, they exploit different properties of the underlying model, they solve different organisational problems, and treating them as interchangeable transports produces the same category error enterprise architecture kept making in the 2000s when it confused an ESB with an API.

This piece argues two things. CLIs beat MCP on developer-facing workflows today because the model weights already know them, and that advantage is structural. MCP's real value sits one layer up, in governance, identity, and access to SaaS surfaces with no shell. The confusion between those two layers explains why teams are burning context windows on infrastructure that belongs elsewhere.

The training-data asymmetry

The token economics get most of the attention. They are real. Scalekit's benchmark shows CLI agents completing tasks in 1,365 to 8,750 tokens with 100% success, while MCP equivalents consume 32,000 to 82,000 tokens at 72% reliability. Apideck reports three MCP servers eating 143,000 tokens of a 200,000-token context window before the agent reads its first user message. A poorly-shaped GitHub MCP server alone ships 93 tool definitions for roughly 55,000 tokens of schema.

That is the visible cost. The invisible cost is cognitive. When 70% of a context window is spent on tool schemas, the reasoning budget the model needs to compose a solution has been evicted. Reliability drops because the agent is trying to think inside the gaps left behind.

Token economics explain why MCP is expensive. They say nothing about why CLI is good. The deeper answer is that models have been trained on decades of shell scripts, git histories, Stack Overflow answers, Kubernetes manifests, jq pipelines, awk one-liners, Dockerfiles, and Makefiles. A well-formed gh pr list --json number,title | jq '.[] | select(.title | contains("bug"))' is a pattern the model has seen thousands of times. The schema is implicit in the weights. The grammar is recalled.

MCP tool definitions arrive cold. They are syntactic strangers injected at inference time, described in JSON Schema the model must parse on every turn, with no prior distribution over how they compose. The model learns the tool from a paragraph of description and a list of parameters, every request, forever. Compare that with kubectl, where verb-noun structure, flag conventions, output shape, and common error modes all sit in pretraining as prior knowledge.

The asymmetry is structural. CLIs evolve slowly. The Unix interface is fifty years old. git is twenty. gh is five. By the time a CLI matures, the training corpus has absorbed it thoroughly. MCP tool schemas change weekly, fork per vendor, and have no canonical idiom. The model meets them fresh on every request.

This means the CLI advantage for mature tooling will widen, not close. Each new generation of base model is better at git than the previous generation because the corpus grows. MCP schemas remain novel because the ecosystem keeps fragmenting. Betting on MCP to reach parity on developer workflows is betting against the gradient of the training data.

The category error

If CLI wins on developer workflows, the obvious conclusion is that MCP is a weaker protocol. That is also wrong. MCP solves problems CLI was never designed for, and the teams using it correctly are operating one layer up the stack.

Consider what actually belongs in an MCP server. Per-user OAuth scopes. Audit trails that tie every action back to a specific principal. Policy engines that gate which agents touch which records. Access to Salesforce, Workday, Greenhouse, or any SaaS surface that exposes an API without shipping a CLI. Stateful connections to systems where re-authenticating on every call is expensive. These are governance concerns. They have nothing to do with execution.

The CLI surface is execution. The developer running the shell is already trusted. The auth model is implicit in the user's session. The tools are local, composable, and cheap to invoke. MCP is the wrong shape here. Asking MCP to mediate git commit is like putting an API gateway in front of a local function call.

MCP is governance. It earns its keep where an agent acts on behalf of a specific principal against a regulated system of record, where every call needs attribution, and where the underlying service has no shell equivalent. Putting MCP in front of kubectl is a category error. Putting it in front of Salesforce record updates executed by a customer-facing agent is the design it was built for.

The taxonomy that works in practice has three planes. The execution plane is CLI territory: developer-facing agents, local tooling, code operations, infrastructure as code, anything with a mature shell interface. Use CLI here. Let the agent compose pipes. Accept the modest cold start of describing the tools and harvest the long tail of pretraining familiarity. The governance plane is MCP territory: customer-facing agents, multi-tenant SaaS, systems of record, regulated workflows, any surface requiring per-request identity and audit. Use MCP here. Accept the schema tax. Spend the tokens where they buy compliance. The knowledge plane sits above both: domain procedures, company-specific conventions, playbooks. No transport applies. These are instructions, and they belong in skills, prompts, or context files.

Confusing these planes produces the pathologies the current debate is cataloguing. MCP servers wrapping shell commands burn tokens for no governance benefit. CLIs bolted onto SaaS integrations leak credentials and lose audit trails. Skills written as MCP tools duplicate schemas that should have been markdown.

What this means in practice

Stop asking "MCP or CLI" and start asking three different questions.

Does this tool have a mature shell interface the base model has seen in training? If yes, prefer CLI. The token savings are a side effect. The real win is that the model is operating from prior knowledge, not parsing schema just-in-time.

Does this action cross a trust boundary that requires per-user identity, scope enforcement, or audit? If yes, prefer MCP. The tokens spent on schema are buying something no other transport provides.

Is this a procedure or convention dressed up as a tool? If yes, it belongs in a skill or prompt. Do not wrap it in a transport at all.

The hybrid architectures emerging at Google, Anthropic, and the more mature agent teams are separating concerns, not hedging bets. Google shipped gws alongside its MCP servers because developers want shell composability and IT administrators want governance, and the same protocol cannot do both well. Anthropic's code execution with MCP pattern is a quiet admission that eager schema loading was a design mistake; the fix is to treat MCP as an API behind a code interpreter, not a direct tool surface.

Token bloat will get better. Progressive disclosure, tool search, deferred loading, and code execution will chip away at it over the next year. The training-data asymmetry will not improve, because it sits in the corpus, not the protocol. Mature CLIs will keep winning developer workflows. MCP will keep winning governance workflows. The protocol question is a distraction from the architecture question, which is which layer of the agent stack each tool actually belongs in. Teams that get that right will ship faster, cheaper, and with cleaner audit trails. Teams that keep arguing about transports will keep paying for the wrong one.


_Related thinking
_Related thinking
_Related thinking
_Related thinking
_Explore our themes
_Explore our themes
_Explore our themes
_Explore our themes