Skip to content

ReAct loops · Tool dispatch · Human approval gates

AI agents that survive crashes and resume where they left off.

Your agent analyzed 50 data points, called 12 tools, and was about to act — then the container restarted. With Orch8, it resumes from the last checkpoint. Every LLM response and tool output is memoized. Nothing re-executes.

The problem

AI agent frameworks handle reasoning. Not durability.

LangGraph, CrewAI, and custom agent loops are great at prompt chaining and tool dispatch. But when the process crashes, the agent loses everything and starts from scratch.

Agents restart from scratch after crashes

Your agent spent 10 minutes reasoning, calling APIs, and building context. The process crashed. All intermediate state is gone. It starts over — re-calling every API, re-running every LLM prompt, burning tokens and time.

Duplicate tool calls on retry

The agent called an external API, but the result wasn't persisted before the crash. On restart, it calls the API again — creating duplicate records, sending duplicate messages, or executing duplicate transactions.

No rate limiting across LLM providers

Multiple agents running in parallel exhaust your OpenAI or Anthropic rate limits. Requests fail, agents error out, and you're left building a custom rate limiter on top of your agent framework.

Human review is an afterthought

Your agent needs human approval before taking high-stakes actions. But your framework doesn't have a built-in way to pause, wait for a signal, and resume. You end up building a custom approval queue.

No visibility into token spend

Agents run autonomously for hours. You have no idea how many tokens they consumed, which models they called, or what the cost was — until the invoice arrives.

Long-running agents block resources

An agent that runs for hours or days holds a thread, a connection, and memory. If you need to deploy, you either kill it (losing state) or wait (blocking the deploy).

Use cases

Any agent. Any model. Any language.

Orch8 is a durable execution engine. These are the AI agent workloads teams reach for it most.

Research agents

Multi-step research that survives overnight runs

Build agents that query multiple data sources, synthesize findings, and produce reports — running for hours or days. If the process restarts, the agent resumes from the last completed step with all prior research intact.

long-running agentcrash-safe researchmemoized outputs

Customer support triage

Classify, route, draft, and escalate — durably

Agents that read tickets, classify intent with an LLM, draft responses, route to the right team, and escalate if confidence is low. Human approval gates before sending any response. Full audit trail of every decision.

support automationhuman-in-the-loopaudit trail

Data extraction pipelines

Process thousands of documents without losing progress

Feed documents through an LLM extraction pipeline: parse, extract structured data, validate, and store. If the agent crashes after processing 500 of 1,000 documents, it resumes at document 501 — no re-processing.

document processingbatch extractioncheckpoint recovery

Code generation agents

Plan, generate, test, iterate — with human checkpoints

Agents that analyze requirements, generate code, run tests, and iterate based on results. Human approval gates before committing to production. Each generation step is memoized — no redundant LLM calls on retry.

code generationiterative agentapproval gates

Sales outreach agents

Personalized outreach at scale with rate-limited sends

Agents that research prospects, generate personalized messages, and send them through rate-limited email channels. If the agent researches 100 prospects but crashes before sending, it resumes at the send step — research is preserved.

sales automationrate-limited outreachpersonalization

Trading and analysis agents

Crash-safe market analysis with human approval before execution

Agents that monitor market data, run analysis through LLMs, generate trading strategies, and wait for human approval before executing. All reasoning and intermediate analysis survives restarts. Signals let you pause or cancel mid-analysis.

trading agentmarket analysishuman approval

How it works

Three steps to a crash-safe AI agent

1

Define the agent loop as a JSON sequence

Describe the agent's reasoning loop — LLM call, tool dispatch, condition check, human approval — as a JSON sequence. No SDK lock-in. The engine handles scheduling, retries, and crash recovery.

{
  "id": "research_agent",
  "blocks": [
    {
      "type": "step",
      "handler": "gather_sources",
      "retry": { "max_attempts": 3, "backoff": "2s" }
    },
    {
      "type": "loop",
      "condition": "{{outputs.analyze.needs_more_data == true}}",
      "max_iterations": 10,
      "blocks": [
        {
          "type": "step",
          "handler": "call_llm",
          "rate_limit_key": "llm:anthropic",
          "rate_limit": { "max": 50, "window_seconds": 60 }
        },
        {
          "type": "router",
          "routes": [
            {
              "condition": "{{outputs.call_llm.tool_calls}}",
              "blocks": [
                { "type": "step", "handler": "execute_tools" }
              ]
            },
            {
              "default": true,
              "blocks": [
                { "type": "step", "handler": "analyze" }
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "step",
      "handler": "human_review",
      "wait_for_signal": true
    },
    {
      "type": "step",
      "handler": "publish_report"
    }
  ]
}
2

Implement handlers as plain HTTP endpoints

Each handler is a POST endpoint in any language. Call any LLM, dispatch any tool, query any API. Orch8 memoizes the output — if the engine restarts, completed handlers return the cached result instead of re-executing.

// TypeScript — LLM call handler
app.post('/workers/call_llm', async (req, res) => {
  const { context, outputs } = req.body;

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    messages: context.data.messages,
    tools: context.data.tools,
  });

  // Output is memoized — if engine crashes after this,
  // the LLM response is preserved, never re-requested
  res.json({
    response: response.content,
    tool_calls: response.content.filter(
      b => b.type === 'tool_use'
    ),
    usage: {
      input_tokens: response.usage.input_tokens,
      output_tokens: response.usage.output_tokens,
    },
  });
});
3

Start the agent and control it mid-execution

Launch agent instances from any trigger. Use signals to pause, resume, cancel, or inject data mid-run. The human_review step waits until you send an approval signal — the agent sleeps with zero resource consumption.

# Start the agent
POST /sequences/research_agent/instances
{
  "context": {
    "data": {
      "topic": "competitive analysis of workflow engines",
      "messages": [{ "role": "user", "content": "..." }],
      "tools": [...]
    }
  }
}

# Agent reaches human_review and pauses.
# Review the output, then approve:
POST /instances/{id}/signals
{
  "type": "custom",
  "payload": { "approved": true, "feedback": "looks good" }
}

Why durability matters for AI agents

Every completed step is saved. Forever.

When a handler completes, its output is persisted before the next step begins. If the engine crashes and recovers, completed steps return the cached result — your call_llm handler runs exactly once. No duplicate LLM calls, no duplicate tool executions, no wasted tokens.

  • Output memoization — LLM responses cached after each step
  • Snapshot recovery — resume from last checkpoint instantly
  • Zero resource consumption while waiting — agents sleep between steps
  • Rate limiting per LLM provider — respect OpenAI, Anthropic, and custom quotas
  • Signals — pause, resume, cancel, or inject data into running agents
  • LLM usage tracking — tokens, cost, and model per step

Agent with human approval gate

{
  "id": "support_agent",
  "blocks": [
    {
      "type": "step",
      "handler": "classify_ticket",
      "rate_limit_key": "llm:openai",
      "rate_limit": { "max": 100, "window_seconds": 60 }
    },
    {
      "type": "step",
      "handler": "draft_response"
    },
    {
      "type": "router",
      "routes": [
        {
          "condition": "{{outputs.classify_ticket.confidence < 0.8}}",
          "blocks": [
            {
              "type": "step",
              "handler": "request_human_review",
              "wait_for_signal": true
            }
          ]
        }
      ]
    },
    {
      "type": "step",
      "handler": "send_response"
    }
  ]
}

What happens on crash

classify_ticket completed — LLM response memoized
Engine restarts — resumes at draft_response
No duplicate classification, no wasted tokens

Capabilities

Everything an AI agent needs to run in production

Agent frameworks handle prompt chaining. Orch8 handles everything else — the infrastructure that makes agents reliable, observable, and safe to run autonomously.

ReAct loop primitive

Define observe-think-act loops with Loop blocks. The agent calls an LLM, decides whether to use a tool or respond, and repeats — with configurable max iterations and exit conditions.

Tool dispatch with memoization

Each tool call is a step. Its output is persisted on completion. If the agent restarts, completed tool calls return cached results. No duplicate API calls, no duplicate side effects.

Human-in-the-loop approval

Any step can pause and wait for a signal. Build approval gates before high-stakes actions: sending emails, executing trades, modifying production data. The agent sleeps with zero resource consumption.

Per-provider LLM rate limiting

Set rate limits per LLM provider, per model, or per tenant. Orch8 tracks usage with a sliding window and defers overages. No requests dropped — they queue and execute when capacity is available.

LLM usage tracking

Every LLM call reports input tokens, output tokens, model, and provider. Query usage per agent, per step, or per time window. Know exactly what your agents cost before the invoice.

Parallel tool execution

When an agent needs to call multiple tools simultaneously, use Parallel blocks. All branches execute concurrently. If one fails, configure whether to cancel siblings or let them complete.

Conditional branching

Route agent behavior based on LLM output, tool results, or context data. Router blocks support arbitrary conditions — confidence thresholds, classification labels, error codes, or custom logic.

Long-running agent support

Agents that run for hours or days don't hold threads or connections. Each step executes, persists its output, and the agent sleeps until the next step is due. Deploy without killing agents.

Multi-agent coordination

Use SubSequence blocks to spawn child agents. Parent agents wait for children to complete. Signals allow inter-agent communication. Build hierarchical agent architectures with shared crash recovery.

Full audit trail

Every LLM call, every tool execution, every routing decision is persisted in PostgreSQL. Replay agent reasoning. Debug failures. Meet compliance requirements for AI-driven decisions.

What you don't build

6 problems you never have to solve

Every team that runs AI agents in production eventually builds all of these. With Orch8, none of them are your problem.

Agent state persistence layer

No custom database schema for agent state. No serialization logic. Orch8 snapshots the full execution state — step outputs, context, and position — automatically.

LLM rate limiter

No Redis-backed token bucket per provider. Set a rate_limit_key and a limit on any LLM step. Orch8 tracks usage with a sliding window and defers overages.

Crash recovery mechanism

No heartbeat polling for stuck agents. No dead agent detector. Instances stalled mid-execution are auto-reset on engine restart and resume from the last checkpoint.

Human approval queue

No separate approval service. Any step can set wait_for_signal to pause until a human sends an approval via the REST API. The agent sleeps with zero resource consumption.

Token usage accounting

No custom logging pipeline for LLM costs. Report input/output tokens from your handler. Orch8 aggregates usage per agent, per step, per provider.

Duplicate execution prevention

No checking whether a tool was already called before the crash. Step outputs are memoized. Retries return the cached result. One execution, guaranteed.

How it fits

Orch8 complements your agent framework

Agent frameworks and durable execution engines solve different problems. Use both — or use Orch8 alone if your agent logic is straightforward.

Agent frameworks

LangGraph, CrewAI, AutoGen, custom loops

What they do well

  • Prompt chaining and template management
  • Tool definition and schema validation
  • Multi-agent conversation patterns
  • Memory and context window management
  • Agent-to-agent communication protocols

What they leave to you

  • Crash recovery and state persistence
  • Rate limiting across providers
  • Human-in-the-loop approval flows
  • Duplicate execution prevention
  • Long-running execution without resource hold

Orch8

Durable execution engine

What it handles

  • Crash recovery with snapshot persistence
  • Output memoization — no duplicate executions
  • Per-provider rate limiting with defer-not-drop
  • Human approval gates via signals
  • Long-running agents with zero idle resource cost
  • LLM usage tracking per step
  • Full audit trail in PostgreSQL
  • Parallel execution and conditional branching
  • Multi-agent coordination via SubSequences
  • Retry with exponential backoff per step

Workflow engines

Temporal, Inngest, Step Functions

Their design focus

  • Enterprise-grade orchestration at scale
  • Complex dependency graphs and saga patterns
  • Strong consistency guarantees
  • Battle-tested in large organizations
  • Broad ecosystem and integrations

Trade-offs for AI agents

  • Determinism constraints limit direct LLM calls
  • Event replay overhead for long-running agents
  • Operational complexity for small teams
  • No built-in LLM rate limiting or usage tracking
  • Steeper learning curve for agent patterns

Use Orch8 with your existing framework — wrap your LangGraph or CrewAI agent in an Orch8 handler for crash recovery, rate limiting, and human approval. Or use Orch8 alone — define agent logic directly as JSON sequences with Loop, Router, and Parallel blocks.

Your AI agent shouldn't restart from scratch.

Tell us what you're building. We'll reach out within 24 hours to walk you through a working agent example — research pipeline, support triage, data extraction, or custom.

No credit card required. Self-host free with no feature gates.