Skip to content

Transactional Outbox

Process managers in Nagare emit two kinds of side effects from a single command: events to persist on their own journal, and commands to dispatch to other aggregates. The outbox makes those two arrive together — either both committed to the database, or neither — and decouples delivery from the producing transaction.

Why

Without an outbox, a process manager that "persists events then awaits dispatch" is exposed to a silent message-loss window:

  1. Events are committed to the journal.
  2. The application crashes (or the grain deactivates, or the network fails) before the dispatch foreach completes.
  3. The events are durable. The dispatches are gone.

The receiver never hears about them and the producer can't tell what was lost. This is the dual-write problem; the standard fix is to record the intent to dispatch in the same database transaction as the events, then deliver from that record asynchronously.

How it works

                        ┌──────────────────────────────┐
       Persist tx ──────▶  events table                │
       (Serializable)     nagare_outbox (rows pending) │
                        └──────────────────────────────┘
                                    │ commit

                        ┌──────────────────────────────┐
       OutboxRunner ────▶  read pending → dispatch     │
       (hosted)            mark dispatched / dead /    │
                          retry with backoff           │
                        └──────────────────────────────┘


                        ┌──────────────────────────────┐
       Receiver ────────▶  IEventMetadata.DispatchId   │
                          dedupe via aggregate LRU     │
                        └──────────────────────────────┘
  • RelationalEventStore.Append opens one transaction, inserts events, inserts outbox rows, commits. Atomic by construction.
  • OutboxRunner polls (and reacts to push wake-up) for state='pending' rows whose next_attempt_at is in the past. For each, it deserialises the command + metadata and forwards via IDispatchSink.
  • The receiving aggregate sees IEventMetadata.DispatchId and checks its bounded LRU. If already applied, returns Replies.Accepted() without re-running the handler. This makes at-least-once delivery effectively exactly-once.
  • Outbox rows are stable; per-attempt history goes into nagare_outbox_attempts (append-only, FK cascade on delete).

Cross-database support

CapabilityPostgresMySQL 8SQL ServerSQLite
IdentityBIGSERIALBIGINT AUTO_INCREMENTBIGINT IDENTITYINTEGER PRIMARY KEY AUTOINCREMENT
Idempotent insert on dispatch_idON CONFLICT DO NOTHINGINSERT IGNOREWHERE NOT EXISTSINSERT OR IGNORE
Pending indexpartial WHERE state='pending'compositefiltered WHERE state='pending'partial WHERE state='pending'
Push wake-upLISTEN/NOTIFYpollpollpoll
Leader electionpg_try_advisory_lockGET_LOCKsp_getapplock @LockOwner='Session'not needed (single-process)

The schema is otherwise identical across backends. Switching between them is a configuration change.

Wiring

csharp
services.AddNagare();

// Backend storage (pick one)
services.AddPostgresEventStore<MyEvent>("MyEvents");
services.AddPostgresOutbox(connectionString);

// Or:
// services.AddSqliteEventStore<MyEvent>("MyEvents");
// services.AddSqliteOutbox();
//
// services.AddMySqlEventStore<MyEvent>("MyEvents");
// services.AddMySqlOutbox(connectionString);
//
// services.AddSqlServerEventStore<MyEvent>("MyEvents");
// services.AddSqlServerOutbox(connectionString);

// Runner + dispatch sink
services.AddOutboxRunner(opts =>
{
    opts.MaxAttempts = 10;
    opts.BatchSize = 32;
    opts.IdleDelay = TimeSpan.FromSeconds(30);
    opts.DispatchedRetention = TimeSpan.FromDays(30);
});
services.AddProcessOutboxSink(); // wraps the existing ICommandDispatcher

// Optional: health checks tagged "ready"
services.AddOutboxHealthChecks();

Producer-side: dispatch ids are deterministic

Process managers don't pick dispatch ids; the framework computes them as a UUIDv5 over (sourceProcess, sourceVersion, dispatchIndex). Combined with the UNIQUE constraint on dispatch_id, this means a producer replay (e.g. grain reactivation after a crash) re-attempts the insert idempotently — same id, conflict, no double-dispatch.

The dispatch id is also embedded in the command's IEventMetadata.DispatchId so the receiver can dedupe.

Receiver-side: aggregate dedupe

Aggregates carry a bounded FIFO set (default 1024 entries) of recently-applied dispatch ids in their in-memory state. The set is rehydrated on Initialize from the metadata of replayed envelopes. When a command arrives with a DispatchId already in the set, the aggregate short-circuits to Replies.Accepted() and persists nothing.

Capacity must comfortably exceed the outbox dispatch retention so a retried row is never older than the window. Tune via:

csharp
public class MyAggregate : Aggregate<MyCommand, MyEvent, MyState>
{
    protected override int DispatchDedupeCapacity => 4096;
}

Behaviour change to be aware of

ProcessGrain.Ask no longer awaits downstream aggregate calls. Once events + outbox rows are durably committed, Ask returns. Delivery happens asynchronously on the runner.

Anything that depended on synchronous downstream side effects from a process command should switch to:

  • An eventually-consistent assertion (read-your-own-writes via the projection), or
  • A subscription on the target aggregate's events.

RelationalEventStore.Append throws if you pass dispatches without an IOutboxStore registered — fail loud rather than silently lose messages.

Ordering guarantees

The runner is single-leader by design. Within one process manager, dispatches arrive in the order they were produced. Across process managers there is no global ordering guarantee — concurrent producers commit independently and the drainer picks rows by id, which is a fine-grained interleaving of every committer.

Do not depend on cross-process order. If a consumer needs it, redesign the events to be self-describing (carry the version / state needed) so out-of-order delivery is tolerable. See anti-patterns.md.

Operations

ConcernWhere to look
Drainer is stalledOutboxLagHealthCheck reports Degraded/Unhealthy based on age of oldest pending row
Dead-letters accumulatingOutboxDeadLetterHealthCheck reports based on state='dead' count
Trace a specific dispatchnagare.outbox.dispatch Activity, tagged with nagare.outbox.dispatch_id
Throughput / outcome breakdownnagare.outbox.attempts counter, tagged by outcome (success/rejected/transient/poison)
Per-attempt latencynagare.outbox.dispatch.duration_ms histogram
Storage growthretention sweep deletes state='dispatched' rows older than DispatchedRetention (default 30d); see nagare.outbox.retention.deleted counter

Failure modes

ScenarioWhat happensWhat you should do
Persist tx failsEvents + outbox rows roll back together. No dispatch sent.Producer retries the command; framework handles.
Drainer crashes mid-tickRow stays pending; next leader (or restarted runner) re-claims and re-dispatches. Receiver dedupes.Nothing — designed for this.
Receiver throws transientRow stays pending with attempts++ and next_attempt_at advanced by backoff.Investigate target service health.
Receiver returns rejectionRow marked dead.Triage via nagare_outbox + nagare_outbox_attempts history.
Retry budget exhaustedRow marked dead with outcome='poison'.Same as above; triage and either re-enable or accept the loss.
Type can't be loaded by drainerRow marked dead immediately (outcome='poison').Drainer needs the same assemblies as the producer. Common in microservice splits.
Connection dropsLeader-election lock auto-released; another node picks up next tick.Nothing — designed for this.

Limits and explicit non-goals

  • One leader at a time. Multi-drainer mode (SELECT … FOR UPDATE SKIP LOCKED claim per source_process) is a future option for higher throughput; defer until single-leader is the bottleneck.
  • No cross-process ordering. As above.
  • No exactly-once on the wire. At-least-once delivery + receiver dedupe gives exactly-once effects, which is what callers actually need.
  • Per-backend leader-election connections need session-pooling-friendly drivers. PgBouncer in transaction-pooling mode breaks session-level advisory locks; use session pooling or talk to Postgres directly for the leader connection.

流れ — flow.