ReAct loops · Tool dispatch · Human approval gates
AI agents that survive crashes and resume where they left off.
Your agent analyzed 50 data points, called 12 tools, and was about to act — then the container restarted. With Orch8, it resumes from the last checkpoint. Every LLM response and tool output is memoized. Nothing re-executes.
The problem
AI agent frameworks handle reasoning. Not durability.
LangGraph, CrewAI, and custom agent loops are great at prompt chaining and tool dispatch. But when the process crashes, the agent loses everything and starts from scratch.
Agents restart from scratch after crashes
Your agent spent 10 minutes reasoning, calling APIs, and building context. The process crashed. All intermediate state is gone. It starts over — re-calling every API, re-running every LLM prompt, burning tokens and time.
Duplicate tool calls on retry
The agent called an external API, but the result wasn't persisted before the crash. On restart, it calls the API again — creating duplicate records, sending duplicate messages, or executing duplicate transactions.
No rate limiting across LLM providers
Multiple agents running in parallel exhaust your OpenAI or Anthropic rate limits. Requests fail, agents error out, and you're left building a custom rate limiter on top of your agent framework.
Human review is an afterthought
Your agent needs human approval before taking high-stakes actions. But your framework doesn't have a built-in way to pause, wait for a signal, and resume. You end up building a custom approval queue.
No visibility into token spend
Agents run autonomously for hours. You have no idea how many tokens they consumed, which models they called, or what the cost was — until the invoice arrives.
Long-running agents block resources
An agent that runs for hours or days holds a thread, a connection, and memory. If you need to deploy, you either kill it (losing state) or wait (blocking the deploy).
Use cases
Any agent. Any model. Any language.
Orch8 is a durable execution engine. These are the AI agent workloads teams reach for it most.
Research agents
Multi-step research that survives overnight runs
Build agents that query multiple data sources, synthesize findings, and produce reports — running for hours or days. If the process restarts, the agent resumes from the last completed step with all prior research intact.
Customer support triage
Classify, route, draft, and escalate — durably
Agents that read tickets, classify intent with an LLM, draft responses, route to the right team, and escalate if confidence is low. Human approval gates before sending any response. Full audit trail of every decision.
Data extraction pipelines
Process thousands of documents without losing progress
Feed documents through an LLM extraction pipeline: parse, extract structured data, validate, and store. If the agent crashes after processing 500 of 1,000 documents, it resumes at document 501 — no re-processing.
Code generation agents
Plan, generate, test, iterate — with human checkpoints
Agents that analyze requirements, generate code, run tests, and iterate based on results. Human approval gates before committing to production. Each generation step is memoized — no redundant LLM calls on retry.
Sales outreach agents
Personalized outreach at scale with rate-limited sends
Agents that research prospects, generate personalized messages, and send them through rate-limited email channels. If the agent researches 100 prospects but crashes before sending, it resumes at the send step — research is preserved.
Trading and analysis agents
Crash-safe market analysis with human approval before execution
Agents that monitor market data, run analysis through LLMs, generate trading strategies, and wait for human approval before executing. All reasoning and intermediate analysis survives restarts. Signals let you pause or cancel mid-analysis.
How it works
Three steps to a crash-safe AI agent
Define the agent loop as a JSON sequence
Describe the agent's reasoning loop — LLM call, tool dispatch, condition check, human approval — as a JSON sequence. No SDK lock-in. The engine handles scheduling, retries, and crash recovery.
{
"id": "research_agent",
"blocks": [
{
"type": "step",
"handler": "gather_sources",
"retry": { "max_attempts": 3, "backoff": "2s" }
},
{
"type": "loop",
"condition": "{{outputs.analyze.needs_more_data == true}}",
"max_iterations": 10,
"blocks": [
{
"type": "step",
"handler": "call_llm",
"rate_limit_key": "llm:anthropic",
"rate_limit": { "max": 50, "window_seconds": 60 }
},
{
"type": "router",
"routes": [
{
"condition": "{{outputs.call_llm.tool_calls}}",
"blocks": [
{ "type": "step", "handler": "execute_tools" }
]
},
{
"default": true,
"blocks": [
{ "type": "step", "handler": "analyze" }
]
}
]
}
]
},
{
"type": "step",
"handler": "human_review",
"wait_for_signal": true
},
{
"type": "step",
"handler": "publish_report"
}
]
}Implement handlers as plain HTTP endpoints
Each handler is a POST endpoint in any language. Call any LLM, dispatch any tool, query any API. Orch8 memoizes the output — if the engine restarts, completed handlers return the cached result instead of re-executing.
// TypeScript — LLM call handler
app.post('/workers/call_llm', async (req, res) => {
const { context, outputs } = req.body;
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
messages: context.data.messages,
tools: context.data.tools,
});
// Output is memoized — if engine crashes after this,
// the LLM response is preserved, never re-requested
res.json({
response: response.content,
tool_calls: response.content.filter(
b => b.type === 'tool_use'
),
usage: {
input_tokens: response.usage.input_tokens,
output_tokens: response.usage.output_tokens,
},
});
});Start the agent and control it mid-execution
Launch agent instances from any trigger. Use signals to pause, resume, cancel, or inject data mid-run. The human_review step waits until you send an approval signal — the agent sleeps with zero resource consumption.
# Start the agent
POST /sequences/research_agent/instances
{
"context": {
"data": {
"topic": "competitive analysis of workflow engines",
"messages": [{ "role": "user", "content": "..." }],
"tools": [...]
}
}
}
# Agent reaches human_review and pauses.
# Review the output, then approve:
POST /instances/{id}/signals
{
"type": "custom",
"payload": { "approved": true, "feedback": "looks good" }
}Why durability matters for AI agents
Every completed step is saved. Forever.
When a handler completes, its output is persisted before the next step begins. If the engine crashes and recovers, completed steps return the cached result — your call_llm handler runs exactly once. No duplicate LLM calls, no duplicate tool executions, no wasted tokens.
- ✓Output memoization — LLM responses cached after each step
- ✓Snapshot recovery — resume from last checkpoint instantly
- ✓Zero resource consumption while waiting — agents sleep between steps
- ✓Rate limiting per LLM provider — respect OpenAI, Anthropic, and custom quotas
- ✓Signals — pause, resume, cancel, or inject data into running agents
- ✓LLM usage tracking — tokens, cost, and model per step
Agent with human approval gate
{
"id": "support_agent",
"blocks": [
{
"type": "step",
"handler": "classify_ticket",
"rate_limit_key": "llm:openai",
"rate_limit": { "max": 100, "window_seconds": 60 }
},
{
"type": "step",
"handler": "draft_response"
},
{
"type": "router",
"routes": [
{
"condition": "{{outputs.classify_ticket.confidence < 0.8}}",
"blocks": [
{
"type": "step",
"handler": "request_human_review",
"wait_for_signal": true
}
]
}
]
},
{
"type": "step",
"handler": "send_response"
}
]
}What happens on crash
Capabilities
Everything an AI agent needs to run in production
Agent frameworks handle prompt chaining. Orch8 handles everything else — the infrastructure that makes agents reliable, observable, and safe to run autonomously.
ReAct loop primitive
Define observe-think-act loops with Loop blocks. The agent calls an LLM, decides whether to use a tool or respond, and repeats — with configurable max iterations and exit conditions.
Tool dispatch with memoization
Each tool call is a step. Its output is persisted on completion. If the agent restarts, completed tool calls return cached results. No duplicate API calls, no duplicate side effects.
Human-in-the-loop approval
Any step can pause and wait for a signal. Build approval gates before high-stakes actions: sending emails, executing trades, modifying production data. The agent sleeps with zero resource consumption.
Per-provider LLM rate limiting
Set rate limits per LLM provider, per model, or per tenant. Orch8 tracks usage with a sliding window and defers overages. No requests dropped — they queue and execute when capacity is available.
LLM usage tracking
Every LLM call reports input tokens, output tokens, model, and provider. Query usage per agent, per step, or per time window. Know exactly what your agents cost before the invoice.
Parallel tool execution
When an agent needs to call multiple tools simultaneously, use Parallel blocks. All branches execute concurrently. If one fails, configure whether to cancel siblings or let them complete.
Conditional branching
Route agent behavior based on LLM output, tool results, or context data. Router blocks support arbitrary conditions — confidence thresholds, classification labels, error codes, or custom logic.
Long-running agent support
Agents that run for hours or days don't hold threads or connections. Each step executes, persists its output, and the agent sleeps until the next step is due. Deploy without killing agents.
Multi-agent coordination
Use SubSequence blocks to spawn child agents. Parent agents wait for children to complete. Signals allow inter-agent communication. Build hierarchical agent architectures with shared crash recovery.
Full audit trail
Every LLM call, every tool execution, every routing decision is persisted in PostgreSQL. Replay agent reasoning. Debug failures. Meet compliance requirements for AI-driven decisions.
What you don't build
6 problems you never have to solve
Every team that runs AI agents in production eventually builds all of these. With Orch8, none of them are your problem.
Agent state persistence layer
No custom database schema for agent state. No serialization logic. Orch8 snapshots the full execution state — step outputs, context, and position — automatically.
LLM rate limiter
No Redis-backed token bucket per provider. Set a rate_limit_key and a limit on any LLM step. Orch8 tracks usage with a sliding window and defers overages.
Crash recovery mechanism
No heartbeat polling for stuck agents. No dead agent detector. Instances stalled mid-execution are auto-reset on engine restart and resume from the last checkpoint.
Human approval queue
No separate approval service. Any step can set wait_for_signal to pause until a human sends an approval via the REST API. The agent sleeps with zero resource consumption.
Token usage accounting
No custom logging pipeline for LLM costs. Report input/output tokens from your handler. Orch8 aggregates usage per agent, per step, per provider.
Duplicate execution prevention
No checking whether a tool was already called before the crash. Step outputs are memoized. Retries return the cached result. One execution, guaranteed.
How it fits
Orch8 complements your agent framework
Agent frameworks and durable execution engines solve different problems. Use both — or use Orch8 alone if your agent logic is straightforward.
Agent frameworks
LangGraph, CrewAI, AutoGen, custom loops
What they do well
- ✓Prompt chaining and template management
- ✓Tool definition and schema validation
- ✓Multi-agent conversation patterns
- ✓Memory and context window management
- ✓Agent-to-agent communication protocols
What they leave to you
- ✗Crash recovery and state persistence
- ✗Rate limiting across providers
- ✗Human-in-the-loop approval flows
- ✗Duplicate execution prevention
- ✗Long-running execution without resource hold
Orch8
Durable execution engine
What it handles
- ✓Crash recovery with snapshot persistence
- ✓Output memoization — no duplicate executions
- ✓Per-provider rate limiting with defer-not-drop
- ✓Human approval gates via signals
- ✓Long-running agents with zero idle resource cost
- ✓LLM usage tracking per step
- ✓Full audit trail in PostgreSQL
- ✓Parallel execution and conditional branching
- ✓Multi-agent coordination via SubSequences
- ✓Retry with exponential backoff per step
Workflow engines
Temporal, Inngest, Step Functions
Their design focus
- ✓Enterprise-grade orchestration at scale
- ✓Complex dependency graphs and saga patterns
- ✓Strong consistency guarantees
- ✓Battle-tested in large organizations
- ✓Broad ecosystem and integrations
Trade-offs for AI agents
- —Determinism constraints limit direct LLM calls
- —Event replay overhead for long-running agents
- —Operational complexity for small teams
- —No built-in LLM rate limiting or usage tracking
- —Steeper learning curve for agent patterns
Use Orch8 with your existing framework — wrap your LangGraph or CrewAI agent in an Orch8 handler for crash recovery, rate limiting, and human approval. Or use Orch8 alone — define agent logic directly as JSON sequences with Loop, Router, and Parallel blocks.
Your AI agent shouldn't restart from scratch.
Tell us what you're building. We'll reach out within 24 hours to walk you through a working agent example — research pipeline, support triage, data extraction, or custom.
No credit card required. Self-host free with no feature gates.