Your AI Agent Works in the Demo. Can You Operate It?

June 13, 2026 · 4 min read

Why the autonomy that makes an agent useful is exactly what makes it impossible to ship - and how a BPMN engine fixes that without taking the agent away.

If you've built anything agentic in the last year, you know the shape of it: an agent plans steps, calls tools, reflects, and loops until it hits a goal. It works in the demo. Then you try to ship it, and you hit the same wall everyone does:

The agent's autonomy is exactly what makes it ungovernable.

What tools is it allowed to call? When does a human approve before it does something irreversible? What does the trace look like six weeks later when compliance asks what happened on instance #48213? And what happens to the half-finished run when the worker crashes while the agent is waiting three days for someone to click "approve"?

You can answer every one of those questions in code. You'll just be rebuilding an audit trail, a durable state store, a human-task inbox, and a retry engine - badly, and forever. There's a better split.

Put the agent where the work is fuzzy

An LLM is good at the fuzzy parts: interpreting an ambiguous request, picking the right tool, summarising a messy document, drafting a reply. It is structurally bad at the parts that must be reliable and identical every time: "this approval must happen before that payment," "retry exactly three times then escalate," "wait fourteen days, then time out."

A workflow engine is the mirror image. So the useful framing isn't "agent or process engine." It's:

Put the agent where the work is fuzzy, and put deterministic orchestration around it where the work must be reliable.

That deterministic box is a BPMN process. The agent becomes a task inside it - not the whole program. A single agent step is a service task. A full reasoning loop is an ad-hoc sub-process: a region of the diagram where the agent decides at runtime which tool to run next, but the tool catalogue is in the model - visible, versioned, access-controlled - instead of implied by a list buried in source.

What you get for free once the agent is a task

Human-in-the-loop is a user task. Draft, hold the token, let a person approve or edit, and branch on the result - with a boundary timer for the SLA. No task-list table, no claim/release protocol to build.
Durable waiting is the default. An agent waiting three days for approval is the canonical durable-execution problem. Running on Temporal, an instance waiting two weeks costs nothing unusual - it resumes from history when the event arrives.
Guardrails are DMN, not prompt-wrangling. "Is this refund within the agent's authority?" lives in a versioned decision table a domain expert can edit - not inside an LLM's reasoning where it varies run to run. The agent proposes; DMN decides what's allowed.
Every step is recorded - and you can watch it back. Inputs, outputs, tool calls, the rules that fired, the human's decision - all in the instance history. When compliance asks, you scrub a replay slider that lights up the executed path on the diagram instead of grepping a log.

The engine doesn't know or care that there's an LLM behind the task. That ignorance is exactly what keeps the orchestration deterministic while the work inside the box is not.

The full post has the worked example

In the complete write-up we walk through a support-automation process end to end - ticket classification, a DMN routing guardrail, an ad-hoc sub-process where the agent drives its own tools, a human-approval path with an escalation timer - and you can download the BPMN and DMN files and run it yourself.

It also gets concrete about the mechanics most engines wave at: how the agent finds the instances that need it, how long LLM calls survive lock leases via heartbeats, and how "the model returned garbage" becomes a first-class, modelled recovery path instead of an unhandled exception.

👉 Read the full post: Orchestrating AI Agents with BPMN: Durable, Auditable Agentic Workflows

Putting agents into production and wrestling with the governance, audit, or human-in-the-loop side of it? We'd like to hear how you're drawing the line - support@quantumbpm.com.

Put the agent where the work is fuzzy​

What you get for free once the agent is a task​

The full post has the worked example​

Put the agent where the work is fuzzy

What you get for free once the agent is a task

The full post has the worked example