Part 1: Setting the Stage for Your Deep Research Agent

Deep research agents orchestrate multiple LLM calls—triaging queries, asking clarifying questions, planning searches, gathering information, and writing reports. But there's a catch: what happens when an agent fails halfway through?

Consider this scenario: Your research agent has already:

Determined your query needs clarification ✓
Generated three clarifying questions ✓
Collected your answers to two of them ✓

Then your server crashes. Without durability, you're back to square one—losing the LLM calls you've already paid for and forcing your user to start over.

This challenge becomes especially important with multi-agent architectures where agents call other agents, creating deep call stacks where a failure at any level can cascade and lose significant work.

In this tutorial, you'll transform a working (but non-durable) deep research agent into a production-ready application using Temporal and the OpenAI Agents SDK. By the end, your agent will:

Survive failures at any step without losing progress
Wait indefinitely for human input while maintaining state
Automatically retry failed LLM calls with exponential backoff
Resume seamlessly after crashes or restarts

We use the OpenAI Agents SDK in this tutorial because it provides a clean, minimal abstraction for building multi-agent systems—and because Temporal has a built-in integration that makes every agent call automatically durable.

Video version of this tutorial

Prerequisites

Before starting this tutorial, you should have:

Beginner knowledge of Temporal including Workflows, Activities, and Workers
An OpenAI API key
Cloned the template repository

Getting Started: Clone the Template Repository

The template repository contains a fully functional deep research agent—but without any durability. Let's get it running first so you can see what we're working with.

You can follow along with the solution here.

Clone the repository:

git clone https://github.com/temporalio/edu-deep-research-tutorial-template.git
cd edu-deep-research-tutorial-template

Install dependencies:

uv sync

Set up your OpenAI API key:

cp .env-sample .env
# Edit .env and add your OPENAI_API_KEY

Skip this step if you already have OPENAI_API_KEY exported in your shell profile (e.g., .zshrc or .bashrc).

Run the application:

uv run run_server.py

Open your browser and navigate to http://localhost:8234

Try entering a research query like "what is the best spaghetti recipe?" The agent will ask clarifying questions, then conduct research and generate a report.

note

Optional - Observe The Problem: While this works, try stopping the server (Ctrl+C) mid-research. When you restart, all the context is gone. Your agent has no memory of what you last asked and you need to start from scratch. Let's fix that.

With Temporal, your agents can handle real-world production challenges:

Rate-limited LLMs? Automatic retries with backoff until capacity returns
Network issues? Retries until requests succeed
Application crashes? Temporal resumes from the last checkpoint, saving you compute and token costs
Found a bug mid-execution? Fix it and continue running Workflows

Understanding the Current Architecture

Before adding Temporal, let's understand the existing structure:

├── run_server.py              # Backend API for the chat interface
├── ui/                        # Browser-based chat interface
└── deep_research/
    ├── agents/                # Individual AI agents (OpenAI Agents SDK)
    │   ├── triage_agent.py    # Decides if clarification is needed
    │   ├── clarifying_agent.py# Generates follow-up questions
    │   ├── planner_agent.py   # Creates search strategy
    │   ├── search_agent.py    # Executes web searches
    │   └── writer_agent.py    # Writes final report
    ├── models.py              # Pydantic models for structured outputs
    └── research_manager.py    # Orchestrates agents + manages sessions (NOT durable)

How the Agent Pipeline Works

When a user submits a research query, it flows through this pipeline:

User Query
    ↓
┌─────────────────┐
│  Triage Agent   │ → Decides: Is this query specific enough?
└─────────────────┘
    ↓ No                          ↓ Yes
┌─────────────────┐               │
│Clarifying Agent │               │
└─────────────────┘               │
    ↓                             │
  User answers questions ←────────┤
    ↓                             │
┌─────────────────┐               │
│ Planner Agent   │ ← ────────────┘
└─────────────────┘
    ↓
┌─────────────────┐
│ Search Agent(s) │ → Runs multiple searches concurrently
└─────────────────┘
    ↓
┌─────────────────┐
│  Writer Agent   │ → Synthesizes results into a report
└─────────────────┘
    ↓
Final Report

The research_manager.py file orchestrates this pipeline and tracks session state in memory. If the server restarts, all that state is lost. We'll replace this with a Temporal Workflow that persists state durably and can wait indefinitely for human input.

The OpenAI Agents SDK and Temporal

Before diving into the implementation, let's understand how the OpenAI Agents SDK works and how Temporal integrates with it.

The OpenAI Agents SDK provides primitives for building AI agents. An Agent combines an LLM with instructions and tools. A Runner executes those agents:

from agents import Agent, Runner

agent = Agent(
    name="Assistant",
    instructions="You help with research.",
    model="gpt-4o-mini",
)
result = await Runner.run(agent, "What is the best spaghetti recipe?")

You can chain agents together—use one agent's output as input to the next—to build complex multi-agent systems like the deep research agent in this tutorial:

# Agent 1: Plan what to search
planner = Agent(name="Planner", instructions="Create a search plan.")
plan = await Runner.run(planner, "Research best restaurants in North Carolina")

# Agent 2: Execute searches based on the plan
searcher = Agent(name="Searcher", instructions="Search the web.")
results = await Runner.run(searcher, plan.final_output)

# Agent 3: Write a report from search results
writer = Agent(name="Writer", instructions="Write a research report.")
report = await Runner.run(writer, results.final_output)

Making Agents Durable with Temporal

The OpenAI Agents SDK has a built-in Temporal integration via the OpenAIAgentsPlugin.

Without the plugin, Runner.run() calls the LLM directly—if it fails or your app crashes, the work is lost. With the plugin, each Runner.run() call is recorded in Temporal's event history. This means:

If an LLM call fails, Temporal automatically retries it (with backoff you configure)
If your Worker crashes mid-research, Temporal knows which Runner.run() calls already completed and skips them on restart—you don't pay for the same LLM calls twice
Your code stays clean—you write normal Runner.run() calls, no special wrappers needed

You code the happy path; Temporal handles the rest. Let's go ahead and try it out!

Setup

Add the Temporal SDK with the OpenAI Agents integration:

uv add 'temporalio[openai-agents]'

Now you'll create these components:

InteractiveResearchManager - A class that orchestrates the multi-agent pipeline: triaging queries, generating clarifying questions, planning searches, executing them, and writing the final report. It calls Runner.run() for each agent. Remember, because it runs inside a Workflow with the OpenAI Agents plugin, every LLM call is automatically durable.
InteractiveResearchWorkflow - The Temporal Workflow that manages the research session. It tracks state (original query, clarification questions, user answers), exposes Updates for the UI to start research and submit answers, and pauses indefinitely while waiting for human input—without consuming resources.
Worker - The process that executes your Workflow and Activities. You'll configure it with OpenAIAgentsPlugin, which is what makes all those Runner.run() calls inside the Workflow automatically become durable Activities.

Here's how these components fit together:

Browser UI ──► Workflow ──► Manager ──► OpenAI API
                  │            │
                  │            └── calls Runner.run() for each agent to make LLM calls durable
                  │
                  └── tracks state (query, questions, answers)

Now that we've set the stage by exploring the template's architecture and how Temporal makes Runner.run() calls durable, let's build these components in Part 2: Creating the Workflow.

Part 1: Setting the Stage for Your Deep Research Agent

Video version of this tutorial​

Prerequisites​

Getting Started: Clone the Template Repository​

Understanding the Current Architecture​

How the Agent Pipeline Works​

The OpenAI Agents SDK and Temporal​

Making Agents Durable with Temporal​

Setup​