Token efficiency with durable execution
LLM calls are expensive. Every token costs money, and every re-execution burns tokens that were already paid for. Durable execution eliminates wasted tokens by ensuring that completed work is never lost.
The cost of crashes without durability
Consider an autonomous agent that runs a 20-step loop. Each iteration calls an LLM (planning) and a tool (execution). The agent is on iteration 18 when the process crashes.
Without durable execution:
The agent restarts from iteration 1. Iterations 1-17 must re-execute — 17 LLM calls that produce the exact same output as before. The tokens are burned again, the tool calls re-execute (potentially causing duplicate side effects), and the user waits for work that was already done.
With Conductor:
The agent resumes from iteration 18. Iterations 1-17 are already persisted — their LLM outputs, tool results, and state are all in durable storage. Zero tokens wasted. Zero duplicate tool calls. The agent picks up exactly where it left off.
Where tokens are saved
1. Crash recovery
Every LLM call in a Conductor workflow is persisted at completion. The prompt, response, token usage, and model are all recorded. If the server, worker, or network fails:
- Completed LLM calls are never re-executed. Their outputs are read from storage.
- Only the in-progress call is retried — and only that single call.
- The workflow resumes from the last persisted state.
Token savings: Proportional to how far the agent progressed before the crash. An agent that crashes at step 18 of 20 saves 17 LLM calls worth of tokens.
2. Retry from failed task
When a workflow fails (e.g., a tool call returns an error after the LLM planned successfully), you can retry from the failed task. Conductor reuses the outputs of all previously completed tasks.
Example: A 5-task agent workflow fails at task 4 (tool execution). Tasks 1-3 included two LLM calls that consumed 8,000 tokens total. Retry from task 4:
- Tasks 1-3 are not re-executed. Their outputs (including LLM responses) are reused from storage.
- Only task 4 (and anything after it) re-executes.
- 8,000 tokens saved per retry.
3. Rerun from a specific task
When you fix a bug in a task definition and rerun from that task, all tasks before it keep their persisted outputs. Upstream LLM calls are not re-executed.
4. Loop checkpointing
Agent loops (DO_WHILE) checkpoint every iteration. If the loop runs 50 iterations and the agent crashes at iteration 48:
- Iterations 1-47 are persisted with all their LLM calls and tool results.
- Only iteration 48 re-executes.
- 47 iterations of LLM tokens saved.
Without durability, the entire loop restarts from iteration 1.
Real-world cost impact
Here's a concrete example using typical LLM pricing:
| Scenario | Without durability | With Conductor | Savings |
|---|---|---|---|
| 20-step agent, crash at step 18 | Re-run all 20 steps: ~40K tokens | Resume from step 18: ~4K tokens | ~36K tokens ($0.04-$0.40) |
| RAG pipeline fails at PDF generation | Re-run embedding + LLM: ~12K tokens | Retry only PDF step: 0 LLM tokens | ~12K tokens ($0.01-$0.12) |
| 100-iteration loop, crash at 95 | Re-run all 100: ~200K tokens | Resume from 95: ~10K tokens | ~190K tokens ($0.19-$1.90) |
| Agent with human approval, reviewer slow | Process may timeout and restart | HUMAN task persists indefinitely | All upstream tokens preserved |
These are per-execution savings. Multiply by thousands of daily executions and the cost difference becomes significant.
At scale — thousands of agent executions per day — even a 5% crash/retry rate translates to substantial token waste without durability. With Conductor, that waste drops to near zero.
Token savings beyond crashes
Durable execution saves tokens in scenarios beyond crashes:
Long-running agents with human-in-the-loop. A HUMAN task can pause a workflow for hours or days. Without durability, the process might timeout or be killed, requiring a full restart (and re-running all upstream LLM calls). With Conductor, the pause is durable — the workflow resumes exactly where it stopped, with all LLM outputs preserved.
Deployment and scaling. When you deploy a new version of your workers or scale down instances, in-flight workflows survive. No LLM calls are lost. Without durability, scaling events can kill processes mid-execution, wasting all tokens consumed so far.
Debugging and iteration. When debugging a failed agent, you can inspect every LLM prompt and response without re-running the agent. Rerun from a specific task to test a fix without re-executing (and re-paying for) upstream LLM calls.
How it works mechanically
Conductor persists LLM task outputs the same way it persists any task output:
- The
LLM_CHAT_COMPLETEtask is scheduled and a worker (or the server itself) executes it. - The LLM response is received — prompt, completion, token usage, model, and latency are all recorded.
- The task moves to
COMPLETEDand its output is written to durable storage before the next task is scheduled. - If anything fails after this point, the LLM output is already persisted. It is never re-executed.
This is the same persistence model that applies to every task in Conductor — the durable execution semantics guarantee that completed work is never lost.
Comparison: durable vs non-durable frameworks
| Non-durable (LangChain, CrewAI, custom) | Durable (Conductor) | |
|---|---|---|
| Crash at step N of M | Restart from step 1. All N tokens re-consumed. | Resume from step N. Zero tokens wasted. |
| Retry after tool failure | Re-run entire chain including LLM calls. | Retry only the failed task. LLM outputs preserved. |
| Long pause (human review) | Process may die. Full restart required. | Durable pause. Resume with all state intact. |
| Debugging | Re-run the agent to reproduce. More tokens. | Inspect persisted outputs. Rerun from any task. |
| Deploy/scale | In-flight work may be lost. | Workflows survive scaling events. |
The bottom line: durable execution is a cost optimization, not just a reliability feature. Every crash, retry, pause, or debugging session that would re-execute LLM calls in a non-durable framework is free in Conductor — because the work was already persisted.
Next steps
- Durable Agents — What persists, what gets retried, and why JSON is AI-native.
- Durable Execution Semantics — The full persistence and recovery model.
- Build Your First AI Agent — Step-by-step tutorial with durable execution built in.
- LLM Orchestration — 14+ native LLM providers, vector databases, content generation.