Skip to content

Self-hosted deployment guide

Conductor is a self-hosted, open source workflow engine that you deploy on your own infrastructure. This production deployment guide covers everything you need to run Conductor at scale: architecture, backend configuration, horizontal scaling, workflow monitoring, and tuning.

Architecture overview

A Conductor deployment consists of these components:

Conductor Architecture

What each component does:

Component Role
API Server Exposes REST and gRPC endpoints for workflow and task operations.
Decider The core state machine. Evaluates workflow state and schedules the next set of tasks.
Sweeper Background process that polls for running workflows and triggers the decider to evaluate them. Required for progress on long-running workflows.
System Task Workers Execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ, etc.) within the server JVM.
Event Processor Listens to configured event buses and triggers workflows or completes tasks based on incoming events.
Database Persists workflow definitions, execution state, task state, and poll data.
Queue Manages task scheduling — pending tasks, delayed tasks, and the sweeper's own work queue.
Index Powers workflow and task search in the UI and via the search API.
Lock Distributed lock that prevents concurrent decider evaluations of the same workflow. Required in production.

Quick start with Docker Compose

For local development and evaluation:

git clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml up

This starts Conductor with Redis (database + queue), Elasticsearch (indexing), and the server with UI on port 8080.

URL Description
http://localhost:8080 Conductor UI
http://localhost:8080/swagger-ui/index.html REST API docs
http://localhost:8080/api/ API base URL

Pre-built compose files for other backend combinations:

Compose file Database Queue Index
docker-compose.yaml Redis Redis Elasticsearch 7
docker-compose-postgres.yaml PostgreSQL PostgreSQL PostgreSQL
docker-compose-postgres-es7.yaml PostgreSQL PostgreSQL Elasticsearch 7
docker-compose-mysql.yaml MySQL Redis Elasticsearch 7
docker-compose-redis-os2.yaml Redis Redis OpenSearch 2
docker-compose-redis-os3.yaml Redis Redis OpenSearch 3
# Example: PostgreSQL for everything
docker compose -f docker/docker-compose-postgres.yaml up

# Example: Redis + OpenSearch 3
docker compose -f docker/docker-compose-redis-os3.yaml up

Production configuration

All configuration is done via Spring Boot properties in application.properties or environment variables. Properties can also be mounted as a Docker volume.

Database

The database stores workflow definitions, execution state, task state, and event handler definitions.

conductor.db.type=postgres

Supported database backends:

Backend Property value When to use Notes
PostgreSQL postgres Recommended for production. ACID, battle-tested, supports indexing too. Requires spring.datasource.* config.
MySQL mysql Production alternative if your team already runs MySQL. Requires spring.datasource.* config. Needs separate queue backend (Redis).
Redis redis_standalone Fast, simple. Good for moderate scale. Requires conductor.redis.* config.
Cassandra cassandra High write throughput, multi-region. Requires conductor.cassandra.* config.
SQLite sqlite Local development only. Single-file, zero config. Default. Not for production.

PostgreSQL

conductor.db.type=postgres
conductor.external-payload-storage.type=postgres

spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Optional tuning
conductor.postgres.deadlockRetryMax=3
conductor.postgres.taskDefCacheRefreshInterval=60s
conductor.postgres.asyncMaxPoolSize=12
conductor.postgres.asyncWorkerQueueSize=100

MySQL

conductor.db.type=mysql

spring.datasource.url=jdbc:mysql://db-host:3306/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Optional tuning
conductor.mysql.deadlockRetryMax=3
conductor.mysql.taskDefCacheRefreshInterval=60s

Redis

conductor.db.type=redis_standalone

# Format: host:port:rack (semicolon-separated for multiple hosts)
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues
conductor.redis.taskDefCacheRefreshInterval=1s

# Connection pool
conductor.redis.maxIdleConnections=8
conductor.redis.minIdleConnections=5

# SSL
conductor.redis.ssl=false

# Auth (password is taken from the first host entry: host:port:rack:password)
# Or set conductor.redis.username and conductor.redis.password directly

Queue

The queue backend manages task scheduling — it tracks which tasks are pending, delayed, or ready for execution. The sweeper and system task workers all depend on it.

conductor.queue.type=postgres

Supported queue backends:

Backend Property value When to use
PostgreSQL postgres Use when database is also PostgreSQL. Simplest stack.
Redis redis_standalone Use when database is Redis or MySQL. Fast, low-latency.
SQLite sqlite Local development only.

Match your queue backend to your database

PostgreSQL database + PostgreSQL queue is the simplest production stack — one fewer dependency. If you use MySQL for the database, pair it with Redis for the queue.


Indexing

The indexing backend powers workflow and task search in the UI and via the /api/workflow/search and /api/tasks/search endpoints.

conductor.indexing.enabled=true
conductor.indexing.type=postgres

Supported indexing backends:

Backend Property value When to use Notes
PostgreSQL postgres Simplest stack when database is also PostgreSQL. Set conductor.elasticsearch.version=0 to disable ES client.
Elasticsearch 7 elasticsearch Best search performance at scale. Full-text search. Set conductor.elasticsearch.version=7.
OpenSearch 2 opensearch2 Open-source ES alternative. Compatible with ES 7 queries.
OpenSearch 3 opensearch3 Latest OpenSearch.
SQLite sqlite Local development only.
Disabled N/A Set conductor.indexing.enabled=false. UI search won't work.

PostgreSQL indexing

conductor.indexing.enabled=true
conductor.indexing.type=postgres
# Disable Elasticsearch client
conductor.elasticsearch.version=0

Elasticsearch 7

conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow

# Performance tuning
conductor.elasticsearch.indexBatchSize=1
conductor.elasticsearch.asyncMaxPoolSize=12
conductor.elasticsearch.asyncWorkerQueueSize=100
conductor.elasticsearch.asyncBufferFlushTimeout=10s
conductor.elasticsearch.indexShardCount=5
conductor.elasticsearch.indexReplicasCount=1

# Auth (if using security)
conductor.elasticsearch.username=elastic
conductor.elasticsearch.password=<password>

OpenSearch

conductor.indexing.enabled=true
conductor.indexing.type=opensearch2   # or opensearch3
conductor.opensearch.url=http://os-host:9200
conductor.opensearch.indexPrefix=conductor
conductor.opensearch.clusterHealthColor=yellow
conductor.opensearch.indexReplicasCount=0

Async indexing

For high-throughput deployments, enable async indexing to decouple the indexing path from the workflow execution path:

conductor.app.asyncIndexingEnabled=true
conductor.app.asyncUpdateShortRunningWorkflowDuration=30s
conductor.app.asyncUpdateDelay=60s

Indexing toggles

Control what gets indexed:

conductor.app.taskIndexingEnabled=true
conductor.app.taskExecLogIndexingEnabled=true
conductor.app.eventMessageIndexingEnabled=true
conductor.app.eventExecutionIndexingEnabled=true

Locking

Required for production

Distributed locking prevents race conditions when multiple server instances evaluate the same workflow concurrently. Always enable locking in production with a distributed lock provider (Redis or Zookeeper).

conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true

Supported lock providers:

Provider Property value When to use
Redis redis Recommended. Use when Redis is already in the stack.
Zookeeper zookeeper Use when Zookeeper is available (e.g. Kafka deployments).
Local local_only Single-instance development only. Not safe for multi-instance.

Redis lock

conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000      # lock held for max 60s
conductor.app.lockTimeToTry=500        # wait up to 500ms to acquire

conductor.redis-lock.serverType=SINGLE              # SINGLE, CLUSTER, or SENTINEL
conductor.redis-lock.serverAddress=redis://redis-host:6379
# conductor.redis-lock.serverPassword=<password>
# conductor.redis-lock.serverMasterName=master     # for Sentinel
# conductor.redis-lock.namespace=conductor          # key prefix
conductor.redis-lock.ignoreLockingExceptions=false

Zookeeper lock

conductor.workflow-execution-lock.type=zookeeper
conductor.app.workflowExecutionLockEnabled=true
conductor.app.lockLeaseTime=60000
conductor.app.lockTimeToTry=500

conductor.zookeeper-lock.connectionString=zk1:2181,zk2:2181,zk3:2181
# conductor.zookeeper-lock.sessionTimeoutMs=60000
# conductor.zookeeper-lock.connectionTimeoutMs=15000
# conductor.zookeeper-lock.namespace=conductor

Sweeper

The sweeper is a background process that monitors running workflows. It polls the queue for workflows that need evaluation and triggers the decider. Without the sweeper, long-running workflows will not make progress.

The sweeper runs automatically as part of the Conductor server. Tune the thread count based on your workflow volume:

# Number of sweeper threads (default: availableProcessors * 2)
conductor.app.sweeperThreadCount=8

# How long to wait when polling the sweep queue (default: 2000ms)
conductor.app.sweeperWorkflowPollTimeout=2000

# Batch size per sweep poll (default: 2)
conductor.app.sweeper.sweepBatchSize=2

# Queue pop timeout in ms (default: 100)
conductor.app.sweeper.queuePopTimeout=100

Sweeper sizing

Start with sweeperThreadCount = 2 * CPU cores. If you see workflows stuck in RUNNING state, increase it. If CPU usage is high on idle, decrease it.


System task workers

System task workers execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ_TRANSFORM, etc.) inside the Conductor server JVM. They poll internal queues for scheduled system tasks and execute them.

# Number of system task worker threads (default: availableProcessors * 2)
conductor.app.systemTaskWorkerThreadCount=20

# Max number of tasks to poll at once (default: same as thread count)
conductor.app.systemTaskMaxPollCount=20

# Poll interval (default: 50ms)
conductor.app.systemTaskWorkerPollInterval=50ms

# Callback duration — how often to re-check async system tasks (default: 30s)
conductor.app.systemTaskWorkerCallbackDuration=30s

# Queue pop timeout (default: 100ms)
conductor.app.systemTaskQueuePopTimeout=100ms

Running system task workers separately

In large deployments, you may want to run system task workers on dedicated instances, separate from the API server. Use the execution namespace to isolate which instance handles system tasks:

# On API-only instances — set a namespace that no system task worker listens on
conductor.app.systemTaskWorkerExecutionNamespace=api-only
conductor.app.systemTaskWorkerThreadCount=0

# On dedicated system task worker instances — match the namespace
conductor.app.systemTaskWorkerExecutionNamespace=worker-pool-1
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40

Isolated system task workers

For task domain isolation (routing specific tasks to specific worker groups):

# Threads per isolation group (default: 1)
conductor.app.isolatedSystemTaskWorkerThreadCount=4

Postpone threshold

When a system task has been polled many times without completing (e.g. a Join waiting for branches), Conductor progressively delays re-evaluation to avoid busy-polling:

# After this many polls, begin exponential backoff (default: 200)
conductor.app.systemTaskPostponeThreshold=200

Event processing

The event processor listens to configured event buses and triggers workflows or completes tasks based on incoming events.

# Thread count for event processing (default: 2)
conductor.app.eventProcessorThreadCount=4

# Event queue polling
conductor.app.eventQueueSchedulerPollThreadCount=4  # default: CPU cores
conductor.app.eventQueuePollInterval=100ms
conductor.app.eventQueuePollCount=10
conductor.app.eventQueueLongPollTimeout=1000ms

See the Event-driven recipes for configuring Kafka, NATS, AMQP, and SQS event queues.


Payload size limits

Conductor enforces payload size limits to prevent oversized data from degrading performance. When a payload exceeds the threshold, it is automatically stored in external payload storage (S3, PostgreSQL, or Azure Blob).

# Workflow input/output — threshold to move to external storage (default: 5120 KB)
conductor.app.workflowInputPayloadSizeThreshold=5120KB
conductor.app.workflowOutputPayloadSizeThreshold=5120KB

# Workflow input/output — hard limit, fails the workflow (default: 10240 KB)
conductor.app.maxWorkflowInputPayloadSizeThreshold=10240KB
conductor.app.maxWorkflowOutputPayloadSizeThreshold=10240KB

# Task input/output — threshold to move to external storage (default: 3072 KB)
conductor.app.taskInputPayloadSizeThreshold=3072KB
conductor.app.taskOutputPayloadSizeThreshold=3072KB

# Task input/output — hard limit, fails the task (default: 10240 KB)
conductor.app.maxTaskInputPayloadSizeThreshold=10240KB
conductor.app.maxTaskOutputPayloadSizeThreshold=10240KB

# Workflow variables — hard limit (default: 256 KB)
conductor.app.maxWorkflowVariablesPayloadSizeThreshold=256KB

For external payload storage configuration, see External Payload Storage.


Workflow monitoring and observability

Conductor exposes Prometheus-compatible metrics out of the box for workflow monitoring and observability:

conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus
management.metrics.web.server.request.autotime.percentiles=0.50,0.75,0.90,0.95,0.99
management.endpoint.health.show-details=always

Scrape http://<conductor-host>:8080/actuator/prometheus with Prometheus.

For details on available metrics, see Server Metrics and Client Metrics.


PostgreSQL stack (simplest)

One database for everything — fewest moving parts.

# Database
conductor.db.type=postgres
conductor.queue.type=postgres
conductor.external-payload-storage.type=postgres
spring.datasource.url=jdbc:postgresql://db-host:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=<password>

# Indexing (use PostgreSQL, no Elasticsearch needed)
conductor.indexing.enabled=true
conductor.indexing.type=postgres
conductor.elasticsearch.version=0

# Locking (use Redis — lightweight, fast)
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379

# Sweeper
conductor.app.sweeperThreadCount=8

# System task workers
conductor.app.systemTaskWorkerThreadCount=20
conductor.app.systemTaskMaxPollCount=20

# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus

Redis + Elasticsearch stack (high throughput)

Best search performance and lowest latency for queue operations.

# Database + Queue
conductor.db.type=redis_standalone
conductor.queue.type=redis_standalone
conductor.redis.hosts=redis-host:6379:us-east-1c
conductor.redis.workflowNamespacePrefix=conductor
conductor.redis.queueNamespacePrefix=conductor_queues

# Indexing
conductor.indexing.enabled=true
conductor.elasticsearch.url=http://es-host:9200
conductor.elasticsearch.version=7
conductor.elasticsearch.indexName=conductor
conductor.elasticsearch.clusterHealthColor=yellow
conductor.app.asyncIndexingEnabled=true

# Locking
conductor.workflow-execution-lock.type=redis
conductor.app.workflowExecutionLockEnabled=true
conductor.redis-lock.serverAddress=redis://redis-host:6379

# Sweeper
conductor.app.sweeperThreadCount=16

# System task workers
conductor.app.systemTaskWorkerThreadCount=40
conductor.app.systemTaskMaxPollCount=40

# Metrics
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=health,info,prometheus

Running with Docker

Using Docker Compose

git clone https://github.com/conductor-oss/conductor
cd conductor
docker compose -f docker/docker-compose.yaml up

To use a different backend, swap the compose file:

docker compose -f docker/docker-compose-postgres.yaml up

Using the standalone image

docker run -p 8080:8080 conductoross/conductor:latest

Custom configuration via volume mount

Mount your own properties file to override the defaults without rebuilding the image:

docker run -p 8080:8080 \
  -v /path/to/my-config.properties:/app/config/config.properties \
  conductoross/conductor:latest

Accessing Conductor

URL Description
http://localhost:8080 Conductor UI
http://localhost:8080/swagger-ui/index.html REST API docs

Shutting down

# Ctrl+C to stop, then:
docker compose down

Multi-instance deployment and horizontal scaling

For high availability and horizontal scaling, run multiple Conductor server instances behind a load balancer. All instances share the same database, queue, index, and lock backends. This architecture enables workflow engine scalability to millions of concurrent executions.

Requirements:

  • Distributed locking must be enabled (redis or zookeeper). Without it, concurrent decider evaluations on the same workflow will cause race conditions.
  • All instances must point to the same database, queue, and indexing backends.
  • The load balancer should use round-robin or least-connections routing.

Optional: separate API and worker instances:

┌──────────────────┐     ┌──────────────────┐
│  API Instance 1  │     │  API Instance 2  │   ← handle REST/gRPC, low system task threads
│  (systemTask=0)  │     │  (systemTask=0)  │
└────────┬─────────┘     └────────┬─────────┘
         │                        │
    ┌────┴────────────────────────┴────┐
    │         Load Balancer            │
    └────┬────────────────────────┬────┘
         │                        │
┌────────┴──────────┐     ┌───────┴───────────┐
│  Worker Instance  │     │  Worker Instance  │  ← high system task threads, sweeper
│  (systemTask=40)  │     │  (systemTask=40)  │
└───────────────────┘     └───────────────────┘

Troubleshooting

Issue Fix
Out of memory or slow performance Check JVM heap usage and adjust -Xms / -Xmx as necessary. Monitor with jstat or the /actuator/health endpoint.
Elasticsearch stuck in yellow health Set conductor.elasticsearch.clusterHealthColor=yellow or add more ES nodes for green.
Workflows stuck in RUNNING Check sweeper is running and sweeperThreadCount > 0. Check lock provider is reachable.
System tasks not executing Verify systemTaskWorkerThreadCount > 0 and the queue backend is reachable.
Config changes not taking effect Properties are baked into the Docker image at build time. Mount a volume instead of rebuilding.