Get in touch

Who we are

What we do

What we think

Join us

Get in touch

Get in touch

What we think

News & Events

OpenAI AgentKit beta review. A smart railway with limited tracks

Oct 16, 2025

5 Mins

Oct 16, 2025

5 Mins

Oct 16, 2025

5 Mins

Oct 16, 2025

5 Mins

Themes

Agentic Systems

Topics

Enterprise AI

AI Governance

SaaS

IT Strategy

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

AI Transparency

OpenAI has just launched AgentKit into beta, and it represents something genuinely interesting: a complete infrastructure stack for building and deploying AI agents for your enterprise.

Think of it as an intelligent railway system. Not the traditional kind with fixed schedules, but one where autonomous trains understand passenger needs, navigate complex networks, make real-time decisions, and coordinate with each other.

But here's the fundamental constraint with any railway: it doesn't matter how intelligent your trains are if they don't go where you need them to.

AgentKit offers real value for specific use cases while imposing equally real constraints on others. Understanding which is which matters enormously for enterprise planning.

Let's examine what OpenAI actually built, where it excels, and where the tracks end.

What AgentKit actually Is

First, some clarity on terminology. In OpenAI's definition, "agents are systems that independently accomplish tasks on behalf of users." We're talking about autonomous systems that can reason, decide, and act. So agents can be given power to do real things (see another article I wrote about the dangers of such power here: https://www.linkedin.com/pulse/agentic-intern-what-you-should-know-before-connecting-ronan-forker-bor9e).

AgentKit isn't a single tool; it's an integrated infrastructure stack:

Agent Builder is the control centre, a visual interface for designing how your autonomous agents behave. You're defining decision patterns and workflows, not writing code for every interaction.
ChatKit provides the pre-built "station". This is the chat interface where users interact with your agents.
Connector Registry manages data and tool access across your workflows.
Evals handles continuous quality assurance. Automated evaluation of target nodes within a workflow execution, checking decisions, validating outputs, and ensuring agents behave as intended.

This is infrastructure for deploying autonomous AI systems at scale, not another no-code tool for building simple chatbots. The question is whether this infrastructure serves your specific requirements.

The infrastructure that matters

Just like a railway’s hidden control systems: the signalling, scheduling, and maintenance operations, AgentKit’s real power lies beneath the surface. The visual interface gets attention, but the real value lies in the features and integrated capabilities that wrap the solution.

Versioning and Publishing

Every workflow publication creates a versioned snapshot. Need to roll back a problematic change? It's immediate. Compliance audit requiring deployment history? The complete record exists.

Built-in Observability

Agent actions are logged and decisions recorded. When agents behave unexpectedly, you can debug and trace what happened. If you've ever debugged a multi-step agent workflow with insufficient logging, you understand the necessity here.

Continuous Evaluation

The Evals integration is built into the workflow so that the agent steps can be tested in isolation from the total workflow. Steps can be evaluated against established values.

Deployment Infrastructure

The workflow and its components need to run somewhere. AgentKit deploys and runs with one click on OpenAI infrastructure. It's a subtle point, but platforms such as n8n Cloud charge for this. Note, given AgentKit is in beta, the cost model could change beyond it’s current general availability.

Understanding the constraints

Like any complex railway, AgentKit’s network has its track limits and signal restrictions. Some routes are smooth and efficient, while others simply don’t exist yet. Recognising these boundaries is essential to avoid derailment and to plan the right journeys for your organisation.

Chat-Only Triggers

Currently, Agent Builder is architecturally designed for conversational interfaces. Agents activate when users send messages, execute workflows, and respond. This works well for many scenarios: customer support, knowledge assistance, research workflows, onboarding guides.

It doesn't work for:

Scheduled operations (running analyses at specific times)
Event-driven workflows (processing data when it arrives)
API-first integrations (systems calling agents directly)
Batch processing (handling queued work items)

Practical example: Your enterprise needs an agent that monitors incoming contracts, extracts terms, validates compliance, and routes for approval. This is a perfectly reasonable agent use case. It's completely impossible on Agent Builder because contracts don't arrive via chat interface; they come through email, document management systems, or API submissions.

Node Configuration Limits

Agent Builder provides a limited set of node types focused on common AI operations and pre-approved connectors such as Google Drive or SharePoint. It covers essential control flow and built-in tools like web and file search.

However, developers cannot create arbitrary API calls, perform complex data transformations, or add custom authentication protocols. This streamlined design simplifies deployment but restricts flexibility for more advanced workflows.

For comparison, n8n offers hundreds of node types across diverse systems, whereas AgentKit’s narrower scope keeps it optimised for AI-centric patterns.

Single-Model Architecture

All workflows run on OpenAI models exclusively. This has strategic implications:

You cannot use different models based on task requirements (Claude for long-form analysis, custom models for specific use cases).
You cannot implement multi-vendor strategies for risk diversification.

This is an ecosystem commitment, not just a tool adoption. When OpenAI's models serve your needs well, the integration is seamless. When they don't, whether due to capability gaps, cost considerations, or availability issues, you have no alternative within this infrastructure.

Observability Integration

The built-in monitoring provides comprehensive visibility into agent behaviour. It doesn't integrate directly with enterprise monitoring platforms (DataDog, New Relic, Splunk), provide custom alerting on organisational SLAs, or export data in formats required by specific compliance frameworks.

You can see everything your agents do. Connecting that visibility to broader enterprise systems requires additional integration work.

Custom Code Nodes

AgentKit does not currently support custom code nodes. Developers cannot write or execute bespoke code directly within a workflow node, limiting flexibility for advanced logic, data manipulation, or API interactions. This represents a significant design constraint.

Choosing the right track

For some journeys, this network will be a good fit; for others, it simply doesn’t have the tracks. But despite these constraints, AgentKit provides genuine value for specific enterprise scenarios.

Strong Fit Scenarios:

Organisations already committed to OpenAI's ecosystem benefit from deep integration across the platform. If your AI strategy centres on OpenAI models, Agent Builder extends that commitment with production infrastructure.

Chat-first use cases align perfectly with the platform's design: customer support automation, internal knowledge bases, conversational workflows, research assistance. These scenarios leverage AgentKit's strengths without encountering its constraints.

When Alternatives Makes Sense:

Event-driven and scheduled workflows remain outside AgentKit's scope. If these patterns are central to your agent strategy, custom infrastructure is necessary.

Multi-model flexibility matters for organisations with deliberate multi-vendor strategies or specific model requirements that OpenAI's offerings don't satisfy.

Complex integrations with legacy systems, unusual authentication requirements, or business logic that doesn't fit available node configurations require custom development.

I look forward to seeing which direction OpenAI takes its AgentKit builder solution as it moves out of beta. The true test will be how it expands the network, whether through APIs, alternative triggers, or custom node capabilities, to open new routes for enterprise innovation. If OpenAI continues to lay new tracks thoughtfully, AgentKit could evolve into an express line.

Themes

Agentic Systems

Topics

Enterprise AI

AI Governance

SaaS

IT Strategy

AI Transparency

OpenAI has just launched AgentKit into beta, and it represents something genuinely interesting: a complete infrastructure stack for building and deploying AI agents for your enterprise.

But here's the fundamental constraint with any railway: it doesn't matter how intelligent your trains are if they don't go where you need them to.

AgentKit offers real value for specific use cases while imposing equally real constraints on others. Understanding which is which matters enormously for enterprise planning.

Let's examine what OpenAI actually built, where it excels, and where the tracks end.

What AgentKit actually Is

AgentKit isn't a single tool; it's an integrated infrastructure stack:

Agent Builder is the control centre, a visual interface for designing how your autonomous agents behave. You're defining decision patterns and workflows, not writing code for every interaction.
ChatKit provides the pre-built "station". This is the chat interface where users interact with your agents.
Connector Registry manages data and tool access across your workflows.
Evals handles continuous quality assurance. Automated evaluation of target nodes within a workflow execution, checking decisions, validating outputs, and ensuring agents behave as intended.

The infrastructure that matters

Versioning and Publishing

Every workflow publication creates a versioned snapshot. Need to roll back a problematic change? It's immediate. Compliance audit requiring deployment history? The complete record exists.

Built-in Observability

Continuous Evaluation

The Evals integration is built into the workflow so that the agent steps can be tested in isolation from the total workflow. Steps can be evaluated against established values.

Deployment Infrastructure

Understanding the constraints

Chat-Only Triggers

It doesn't work for:

Scheduled operations (running analyses at specific times)
Event-driven workflows (processing data when it arrives)
API-first integrations (systems calling agents directly)
Batch processing (handling queued work items)

Node Configuration Limits

For comparison, n8n offers hundreds of node types across diverse systems, whereas AgentKit’s narrower scope keeps it optimised for AI-centric patterns.

Single-Model Architecture

All workflows run on OpenAI models exclusively. This has strategic implications:

You cannot use different models based on task requirements (Claude for long-form analysis, custom models for specific use cases).
You cannot implement multi-vendor strategies for risk diversification.

Observability Integration

You can see everything your agents do. Connecting that visibility to broader enterprise systems requires additional integration work.

Custom Code Nodes

Choosing the right track

For some journeys, this network will be a good fit; for others, it simply doesn’t have the tracks. But despite these constraints, AgentKit provides genuine value for specific enterprise scenarios.

Strong Fit Scenarios:

When Alternatives Makes Sense:

Event-driven and scheduled workflows remain outside AgentKit's scope. If these patterns are central to your agent strategy, custom infrastructure is necessary.

Multi-model flexibility matters for organisations with deliberate multi-vendor strategies or specific model requirements that OpenAI's offerings don't satisfy.

Complex integrations with legacy systems, unusual authentication requirements, or business logic that doesn't fit available node configurations require custom development.

Themes

Agentic Systems

Topics

Enterprise AI

AI Governance

SaaS

IT Strategy

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

Ronan Forker

Value Engineer - Architect

Are you ready to shape the future enterprise?

Get in touch, and let's talk about what's next.

Get in touch

Are you ready to shape the future enterprise?

Get in touch, and let's talk about what's next.

Get in touch

Are you ready to shape the future enterprise?

Get in touch, and let's talk about what's next.

Get in touch

Are you ready to shape the future enterprise?

Get in touch, and let's talk about what's next.

Get in touch

_Related thinking

View all thinking

_Related thinking

View all thinking

_Related thinking

View all thinking

_Related thinking

View all thinking

_Related thinking

View all thinking

Introducing Aardvark: OpenAI’s agentic security researcher ...

Future Enterprise

Omar Nawaz

Nov 6, 2025

"Aardvark targets the code-review bottleneck and rising risks of agentic coding. However, CTOs should see it as a mandate to modernise the SDLC as without human oversight it will amplify, not reduce, risk."

Introducing AgentKit...

Agentic Systems