Transactional Outbox
Process managers in Nagare emit two kinds of side effects from a single command: events to persist on their own journal, and commands to dispatch to other aggregates. The outbox makes those two arrive together — either both committed to the database, or neither — and decouples delivery from the producing transaction.
Why
Without an outbox, a process manager that "persists events then awaits dispatch" is exposed to a silent message-loss window:
- Events are committed to the journal.
- The application crashes (or the grain deactivates, or the network fails) before the dispatch foreach completes.
- The events are durable. The dispatches are gone.
The receiver never hears about them and the producer can't tell what was lost. This is the dual-write problem; the standard fix is to record the intent to dispatch in the same database transaction as the events, then deliver from that record asynchronously.
How it works
┌──────────────────────────────┐
Persist tx ──────▶ events table │
(Serializable) nagare_outbox (rows pending) │
└──────────────────────────────┘
│ commit
▼
┌──────────────────────────────┐
OutboxRunner ────▶ read pending → dispatch │
(hosted) mark dispatched / dead / │
retry with backoff │
└──────────────────────────────┘
│
▼
┌──────────────────────────────┐
Receiver ────────▶ IEventMetadata.DispatchId │
dedupe via aggregate LRU │
└──────────────────────────────┘RelationalEventStore.Appendopens one transaction, inserts events, inserts outbox rows, commits. Atomic by construction.OutboxRunnerpolls (and reacts to push wake-up) forstate='pending'rows whosenext_attempt_atis in the past. For each, it deserialises the command + metadata and forwards viaIDispatchSink.- The receiving aggregate sees
IEventMetadata.DispatchIdand checks its bounded LRU. If already applied, returnsReplies.Accepted()without re-running the handler. This makes at-least-once delivery effectively exactly-once. - Outbox rows are stable; per-attempt history goes into
nagare_outbox_attempts(append-only, FK cascade on delete).
Cross-database support
| Capability | Postgres | MySQL 8 | SQL Server | SQLite |
|---|---|---|---|---|
| Identity | BIGSERIAL | BIGINT AUTO_INCREMENT | BIGINT IDENTITY | INTEGER PRIMARY KEY AUTOINCREMENT |
Idempotent insert on dispatch_id | ON CONFLICT DO NOTHING | INSERT IGNORE | WHERE NOT EXISTS | INSERT OR IGNORE |
| Pending index | partial WHERE state='pending' | composite | filtered WHERE state='pending' | partial WHERE state='pending' |
| Push wake-up | LISTEN/NOTIFY | poll | poll | poll |
| Leader election | pg_try_advisory_lock | GET_LOCK | sp_getapplock @LockOwner='Session' | not needed (single-process) |
The schema is otherwise identical across backends. Switching between them is a configuration change.
Wiring
services.AddNagare();
// Backend storage (pick one)
services.AddPostgresEventStore<MyEvent>("MyEvents");
services.AddPostgresOutbox(connectionString);
// Or:
// services.AddSqliteEventStore<MyEvent>("MyEvents");
// services.AddSqliteOutbox();
//
// services.AddMySqlEventStore<MyEvent>("MyEvents");
// services.AddMySqlOutbox(connectionString);
//
// services.AddSqlServerEventStore<MyEvent>("MyEvents");
// services.AddSqlServerOutbox(connectionString);
// Runner + dispatch sink
services.AddOutboxRunner(opts =>
{
opts.MaxAttempts = 10;
opts.BatchSize = 32;
opts.IdleDelay = TimeSpan.FromSeconds(30);
opts.DispatchedRetention = TimeSpan.FromDays(30);
});
services.AddProcessOutboxSink(); // wraps the existing ICommandDispatcher
// Optional: health checks tagged "ready"
services.AddOutboxHealthChecks();Producer-side: dispatch ids are deterministic
Process managers don't pick dispatch ids; the framework computes them as a UUIDv5 over (sourceProcess, sourceVersion, dispatchIndex). Combined with the UNIQUE constraint on dispatch_id, this means a producer replay (e.g. grain reactivation after a crash) re-attempts the insert idempotently — same id, conflict, no double-dispatch.
The dispatch id is also embedded in the command's IEventMetadata.DispatchId so the receiver can dedupe.
Receiver-side: aggregate dedupe
Aggregates carry a bounded FIFO set (default 1024 entries) of recently-applied dispatch ids in their in-memory state. The set is rehydrated on Initialize from the metadata of replayed envelopes. When a command arrives with a DispatchId already in the set, the aggregate short-circuits to Replies.Accepted() and persists nothing.
Capacity must comfortably exceed the outbox dispatch retention so a retried row is never older than the window. Tune via:
public class MyAggregate : Aggregate<MyCommand, MyEvent, MyState>
{
protected override int DispatchDedupeCapacity => 4096;
}Behaviour change to be aware of
ProcessGrain.Ask no longer awaits downstream aggregate calls. Once events + outbox rows are durably committed, Ask returns. Delivery happens asynchronously on the runner.
Anything that depended on synchronous downstream side effects from a process command should switch to:
- An eventually-consistent assertion (read-your-own-writes via the projection), or
- A subscription on the target aggregate's events.
RelationalEventStore.Append throws if you pass dispatches without an IOutboxStore registered — fail loud rather than silently lose messages.
Ordering guarantees
The runner is single-leader by design. Within one process manager, dispatches arrive in the order they were produced. Across process managers there is no global ordering guarantee — concurrent producers commit independently and the drainer picks rows by id, which is a fine-grained interleaving of every committer.
Do not depend on cross-process order. If a consumer needs it, redesign the events to be self-describing (carry the version / state needed) so out-of-order delivery is tolerable. See anti-patterns.md.
Operations
| Concern | Where to look |
|---|---|
| Drainer is stalled | OutboxLagHealthCheck reports Degraded/Unhealthy based on age of oldest pending row |
| Dead-letters accumulating | OutboxDeadLetterHealthCheck reports based on state='dead' count |
| Trace a specific dispatch | nagare.outbox.dispatch Activity, tagged with nagare.outbox.dispatch_id |
| Throughput / outcome breakdown | nagare.outbox.attempts counter, tagged by outcome (success/rejected/transient/poison) |
| Per-attempt latency | nagare.outbox.dispatch.duration_ms histogram |
| Storage growth | retention sweep deletes state='dispatched' rows older than DispatchedRetention (default 30d); see nagare.outbox.retention.deleted counter |
Failure modes
| Scenario | What happens | What you should do |
|---|---|---|
| Persist tx fails | Events + outbox rows roll back together. No dispatch sent. | Producer retries the command; framework handles. |
| Drainer crashes mid-tick | Row stays pending; next leader (or restarted runner) re-claims and re-dispatches. Receiver dedupes. | Nothing — designed for this. |
| Receiver throws transient | Row stays pending with attempts++ and next_attempt_at advanced by backoff. | Investigate target service health. |
| Receiver returns rejection | Row marked dead. | Triage via nagare_outbox + nagare_outbox_attempts history. |
| Retry budget exhausted | Row marked dead with outcome='poison'. | Same as above; triage and either re-enable or accept the loss. |
| Type can't be loaded by drainer | Row marked dead immediately (outcome='poison'). | Drainer needs the same assemblies as the producer. Common in microservice splits. |
| Connection drops | Leader-election lock auto-released; another node picks up next tick. | Nothing — designed for this. |
Limits and explicit non-goals
- One leader at a time. Multi-drainer mode (
SELECT … FOR UPDATE SKIP LOCKEDclaim persource_process) is a future option for higher throughput; defer until single-leader is the bottleneck. - No cross-process ordering. As above.
- No exactly-once on the wire. At-least-once delivery + receiver dedupe gives exactly-once effects, which is what callers actually need.
- Per-backend leader-election connections need session-pooling-friendly drivers. PgBouncer in transaction-pooling mode breaks session-level advisory locks; use session pooling or talk to Postgres directly for the leader connection.