Not another chatbot. The journey from calculator to colleague
Imagine hiring the brightest consultant in the world. This chap is capable of synthesising market reports in seconds, answering any technical question with precision, and summarising entire domains on demand. But there’s a catch: this consultant has amnesia. Every time you speak, they forget everything they knew before. No memory of your company, your goals, your last conversation. Each interaction starts from scratch.
This is how most enterprises are currently engaging with large language models (LLMs).
Despite the hype, today’s LLM usage remains largely stateless and transactional. An LLM is prompted, it responds, and the interaction ends. While this is sufficient for isolated tasks - generating content, translating language, writing code snippets - it falls short in the context of enterprise transformation, where complexity demands continuity, reasoning, and coordination.
The limits of stateless intelligence
This "prompt-in, response-out" pattern introduces several critical bottlenecks for enterprise use:
Lack of context continuity: Without persistent memory, models must be re-fed the same context repeatedly, which becomes both expensive and error-prone as complexity scales.
Manual orchestration: Human operators are forced to string together outputs, transforming themselves into workflow glue between disconnected model calls.
Shallow reasoning: Multi-step, conditional, or exploratory reasoning is difficult to achieve reliably without some form of planning and memory.
In essence, enterprises are wielding a Ferrari engine without a steering wheel or gearbox. There is raw power there but there’s no coordination. No navigation.
The promise of cognitive workflows
The next evolution is already underway: agentic systems - LLMs wrapped in structure, memory, and purpose - capable of navigating multi-step tasks, adapting over time, and working in coordination with other tools and agents.
This shift is not just technical. It redefines how organisations work with AI. We’re moving from AI-as-a-tool to AI-as-a-colleague. They’re an active participant in workflows, with the ability to remember, plan, and collaborate.
In this piece, we’ll unpack the technical progression from reactive tools to proactive systems, introduce a five-stage maturity model for enterprises on this journey, and offer practical steps for leaders aiming to operationalise agentic AI. Whether you’re an innovation lead exploring AI’s potential, a CTO building cognitive infrastructure, or a venture partner seeking differentiated advantage, this transition marks a turning point in enterprise capability.
The technical evolution: From tools to agents
As is my usual wont, it’s time to get a bit technical.
The initial wave of LLM adoption was driven by experimentation. We’ve been treating these models as smart tools capable of generating language, answering questions, or writing code on demand. But as their potential deepens, a growing awareness has emerged: true enterprise value lies not in using LLMs, but in integrating them into intelligent, evolving systems. This section explores the shift from stateless tools to stateful, reasoning-driven agents, and the infrastructure underpinning this transformation.
From stateless tools to stateful systems
At their core, most LLMs are stateless: they do not retain any information between calls. Every interaction must be re-initialised from scratch, often by stuffing the input prompt with a condensed version of the problem, background, and user intent. This is both inefficient and fragile. A single error in prompt construction can derail the entire output. One-shotting prompts is more luck than science. There’s pretty much always refinement required.
By contrast, stateful systems maintain context over time. They remember prior interactions, adapt based on user behaviour, and learn from new inputs. In practice, this enables:
Task continuity: The agent doesn't need to be reminded what it's working on.
Personalisation: Responses reflect long-term user preferences.
Workflow orchestration: Outputs from one task inform the inputs for the next.
Architectures of memory: How agents "remember"
To move from tool to agent, memory is non-negotiable. Drawing from cognitive science, we can categorise memory into three core types:
Episodic memory: Logs past interactions and conversations. For agents, this could include chat history or previous tool outputs.
Semantic memory: Encodes factual knowledge, concepts, and world understanding. This is where vector databases and knowledge graphs come in, offering structured and retrievable stores of meaning.
Procedural memory: Captures how to do things. In agentic systems, this is expressed as plans, workflows, and action sequences. Also hooked into the Ontology of the enterprise.
Building agents requires combining all three so they can reason across past events, domain-specific knowledge, and repeatable skills.
Reasoning chains and planning mechanisms
One of the biggest limitations of vanilla LLM use is that they’re often reactive, not reflective. They generate a response based on the prompt, but don’t internally plan, reflect, or revise. Agents change that. The research capabilities delivered by the larger players like Anthropic and OpenAI aren’t LLMs, but agents on top of LLMs.
Through mechanisms like Chain-of-Thought (CoT) reasoning and tree-of-thoughts, agents simulate cognitive planning: decomposing problems into sub-tasks, evaluating intermediate steps, and adjusting strategy dynamically.
Planning agents often rely on recurrent planning loops: observe the current state, generate the next action, update memory, repeat. In more advanced systems, planners can even invoke sub-agents—specialist modules that handle search, summarisation, or validation, enabling modular and scalable workflows.
Technical deep-dive: The stack behind agentic workflows
Let’s walk through the typical evolution of an AI system architecture in this context:
1. RAG (Retrieval-Augmented Generation)
RAG pairs a stateless LLM with a retrieval system, such as a vector database (e.g. Chroma, Qdrant, Weaviate, Pinecone, LanceDB). Instead of overloading the prompt, relevant context is retrieved dynamically using similarity search.
🟢 Benefit: Expanded context without hitting token limits
🔴 Limitation: Still lacks memory or multi-step reasoning
2. Multi-Agent Orchestration
In more advanced setups, multi-agent systems divide labour among specialist agents:
A planner agent decomposes the task
A retriever agent fetches data
A validator agent checks quality
A tool-use agent interacts with APIs or systems
These agents coordinate via an orchestration layer, often a framework like LangChain, CrewAI, or DeepAgents, that handles task routing, state management, and agent collaboration.
🟢 Benefit: Scalable, modular systems
🔴 Limitation: Requires robust architecture and tuning
3. Autonomous Workflows
The endgame is fully autonomous cognitive workflows, where agents operate continuously in service of a goal:
Input: High-level objective (e.g. “Audit our supplier ESG risk exposure”)
Output: Complete, validated report with citations and next steps
Behaviour: Persistent memory, reasoning across time, cross-agent communication, dynamic replanning
These systems blur the line between “process automation” and “cognitive collaboration”.
The rise of semantic infrastructure
To support this evolution, a new kind of backend is emerging: semantic infrastructure. This includes:
Vector databases (e.g. Chroma, Qdrant) for similarity-based recall
Knowledge graphs (e.g. Neo4j, Stardog, RDF stores) for structured reasoning and relation mapping
Ontologies for domain-specific logic and constraints (Palantir, OWL, SHACL, or proprietary structures)
Together, they enable reasoning over structured and unstructured data, connecting the dots between isolated facts, and making agents not just reactive, but truly knowledgeable.
The transition from stateless tools to agentic systems is not just a matter of improving outputs. It’s a re-architecture of how intelligence is embedded in enterprise systems. With memory, planning, tool use, and semantic understanding, agents become more than model wrappers. They become co-workers, orchestrators, and decision participants.
In the next section, we’ll introduce a maturity model to help enterprises understand where they sit on this journey. We can then really start to see how to move from AI-as-a-tool to AI-as-a-colleague.
The enterprise AI maturity model

Stage | Capability | Key Tech | ROI Potential |
|---|---|---|---|
1. Calculator | Prompt-only | LLM APIs | Individual productivity |
2. Assistant | Context-aware | RAG, vector DBs | Faster research, lower support costs |
3. Analyst | Reasoning & tools | Function calling, planning | Automated insight & analysis |
4. Collaborator | Memory & adaptation | Agent memory, identity | Workflow augmentation |
5. Colleague | Autonomy & goal pursuit | Orchestration + semantics | Enterprise-level transformation |
Most enterprises are still in the early stages of working with AI. They’ve experimented with LLMs, perhaps implemented a few automations, and are wondering: Where do we go from here? This maturity model offers a structured path forward.
It outlines five progressive stages of evolution—each representing a shift in how AI is deployed, integrated, and ultimately, how it contributes to business value.
1. AI as a Calculator: Basic Prompt Engineering
At this stage, LLMs are used in isolation. Like a more sophisticated command-line interface. Use cases tend to be tactical and disconnected: writing content, summarising text, generating code snippets, or translating languages.
There is no memory, no orchestration, no understanding of process or continuity. These models are reactive, not reflective.
Technical Requirements
API access to a foundation model (e.g. OpenAI, Anthropic)
Basic UI wrapper or IDE integration
Prompt libraries or playgrounds
Organisational Readiness Indicators
Central innovation team or tech enthusiasts piloting use cases
No formal governance or training
Little cross-team coordination
ROI Expectations
Quick wins
Productivity boosts at the individual level
Low infrastructure costs, but limited strategic impact
🟠 Risk: Easy to stall here. We’re treating AI as novelty rather than core capability
2. AI as an Assistant: Retrieval-Augmented Generation (RAG) and Context Injection
Capabilities and Limitations
LLMs are now paired with contextual data. At this stage it’s through vector databases or document repositories. Now we’re allowing systems to bring in domain-specific knowledge on demand, expanding the LLM’s “working memory”.
Use cases include customer support assistants, internal knowledge bots, or legal document Q&A.
However, the systems are still stateless and shallowly integrated. Memory resets with each interaction, and reasoning is limited to what can be retrieved and synthesised in a single step.
Technical Requirements
Vector database (e.g. Pinecone, Weaviate)
RAG framework (LangChain, LlamaIndex)
Content ingestion pipelines
Prompt templating and chunking logic
Organisational Readiness Indicators
Early AI platform team forming
Internal security/legal reviews starting
Fragmented experimentation across business units
ROI Expectations
Reduced support burden
Accelerated research and discovery
Visible user-facing enhancements
🟡 Note: Retrieval improves accuracy and relevance, but not true reasoning
3. AI as an Analyst: Multi-step Reasoning with Tool Use
Capabilities and Limitations
AI is now capable of chaining thoughts and invoking tools. Here we’re starting to see our systems moving from answers to analysis. Systems can plan multi-step actions, perform calculations, query APIs, and explore solution spaces.
Agents emerge here as planners and operators, working in loops: plan → act → observe → replan.
Still, orchestration is often manually configured, and memory is ad-hoc or brittle. Scaling remains a challenge due to lack of modularity or governance. FinOps starts to take a front seat as token usage is increasingly a concern.
Technical Requirements
Tool-use frameworks (e.g. OpenAI function calling, CrewAI, LangGraph, N8n)
Planning logic (e.g. ReAct, CoT, ToT)
Tool wrappers and API bridges
Structured logging and observability
Organisational Readiness Indicators
Dedicated budget for AI workflows
Engineers or analysts building agent pipelines
First experiments in replacing human-in-the-loop research tasks
ROI Expectations
Cost reduction in knowledge work
Increased decision accuracy
Insight generation at scale
🟢 Enabler: Cross-functional teams can now build and reason with domain knowledge
4. AI as a Collaborator: Persistent Agents with Memory
Capabilities and Limitations
Agents now persist over time, maintaining episodic memory, updating semantic understanding, and learning from interactions. They’re capable of reasoning across sessions, adapting to users, and tracking task state.
This unlocks personalisation, long-horizon workflows, and multi-agent collaboration.
Technical complexity increases significantly: you need storage and recall strategies, memory pruning, identity handling, and lifecycle management.
Technical Requirements
Memory layers (e.g. Redis, Milvus, or hybrid graph+vector stores)
Agent identity resolution
Ontological structures or semantic schemas
Agent lifecycle/state management
Organisational Readiness Indicators
Formal agent governance and design roles
AI ethics, compliance, and observability policies in place
Integration into core systems and workflows
ROI Expectations
Reduced human overhead for repetitive tasks
AI becomes a team augmentation layer
Step-change in workflow efficiency
🔵 Turning point: From “AI helping humans” to “humans and AI working together”
5. AI as a Colleague: Autonomous Cognitive Workflows
Capabilities and Limitations
This is the frontier: autonomous agents coordinating with each other and systems and employees to pursue objectives, not just answer queries.
Agents are now embedded within business processes. They can plan complex projects, react to dynamic inputs, invoke and supervise other agents, and escalate only when needed. Systems reflect intentionality, goal pursuit, and adaptive behaviour.
Limitations now shift from technical to cultural: trust, control, governance, and collaboration norms.
Technical Requirements
Orchestration layer with autonomous planning (e.g. LangGraph, DeepAgents, DSPy)
Semantic reasoners and ontologies for grounded logic
Real-time memory sync, logging, audit trails
Secure toolchains and enterprise data integration
Organisational Readiness Indicators
AI embedded in enterprise architecture strategy
Change management frameworks for AI-led operations
Incentives aligned around AI co-ownership and human-AI hybrid teams
ROI Expectations
Autonomous cost centres for process execution
Competitive advantage through faster decision loops
Platform-level efficiencies across departments
🟣 Strategic imperative: Enterprises at this stage define—not follow—the frontier
Future horizons: What’s next
We are at the inflection point between useful AI and transformative AI. As agentic systems evolve, new architectural patterns are beginning to emerge. These systems are ones that blend reasoning, memory, and planning with ever-deeper integrations into the enterprise fabric.
Cognitive architectures are going multimodal and multimind
Next-generation agentic systems are increasingly:
Multimodal: Incorporating vision, speech, and structured data to reason across input types, not just text.
Multimind: Coordinating multiple specialist agents (retrievers, validators, planners, actuators) into decentralised teams, akin to human departments working toward a common goal.
Frameworks like LangGraph, DSPy, and DeepAgents are pioneering the idea of composable cognition. This speaks strongly to my roots in the composable enterprise arena. Where plans, goals, tools, and context are stitched together dynamically, not hardcoded. These architectures allow for adaptable behaviour and continuous learning, rather than rigid flows. What we saw in the advent and maturation of the MACH ecosystem, we’re seeing again in the Agentic world.
Symbolic + neural = structured reasoning at scale
The once-parallel paths of symbolic AI (rules, logic, ontologies) and neural AI (LLMs, embeddings) are now converging. Symbolic approaches offer rigour, traceability, and compliance. Neural models bring language fluency and generalisation.
Together, they enable:
Grounded reasoning that adheres to business logic and constraints
Compliance-aware automation using structured domain knowledge
Explainable outputs with contextual justification
Enterprises that can bridge these modalities, via semantic layers, knowledge graphs, and ontologies, will unlock AI systems that reason like analysts, act like experts, and learn like colleagues. Newer and more capable frontier models are making this easier and a more viable reality than ever before.
Strategic advantage Is shifting
The new frontier is no longer access to models. It’s the ability to build agentic infrastructure that is:
Secure by design
Contextually grounded
Observable
Able to learn and reason in enterprise-specific domains
Enterprises that succeed here will not just improve efficiency. They will reshape their operating model. AI will cease to be a support function and become a core capability, embedded in strategy, delivery, and governance.
In short, the winners of the next decade will not be those who “adopt AI,” but those who re-architect their organisations to think with it.
From pilot to platform
The shift to agentic AI is not a technical upgrade. Similar to the pivot from monolith to MACH, it’s a strategic transformation. Whether you’re at the beginning or already piloting multi-agent systems, now is the moment to ask:
Key questions for leadership teams
Are we building tools or enabling intelligence?
Do we treat AI as a project, or as a platform for transformation?
Where in our organisation would persistent, reasoning agents deliver the highest leverage?
First steps by maturity level
Calculator/Assistant: Build a central knowledge infrastructure (vector DBs, document pipelines). Begin capturing usage data to identify repeatable patterns.
Analyst: Introduce tool use and reasoning chains. Establish observability for agent workflows and model quality.
Collaborator/Colleague: Formalise memory, governance, and agent lifecycles. Align incentives and metrics to support human–AI collaboration.
Why act now?
The window for strategic differentiation is shrinking. Foundation models are commoditising. What will remain defensible is how you orchestrate them: the workflows, the knowledge, the decisions.
You don’t need to build everything at once. But you do need to start with intent. You need to design for the organisation you want to become.
















