Cluster
Nagare runs fine on a single instance, but most production deployments will scale out. This page explains the two pieces that make multi-instance deployments work safely:
- Distributed locks stop two instances from running the same outbox runner or singleton subscription at the same time.
- The lease registry is a small visibility table the dashboard reads so you can see which node holds which lock.
The locks are part of the storage backend you already configured. The lease registry is opt-in — turn it on and the Cluster page in the dashboard fills in.
Why locks at all
A few things in Nagare are meant to run as exactly one instance:
- The outbox runner. If two runners drain the same
nagare_outboxtable they will fight over rows, double-dispatch, and burn retry budget. - Singleton-flavoured subscriptions — the catch-up path of a
Subscriptionthat owns a checkpoint, andEventRouteSubscriptionfor process managers. Two of either running side-by-side would step on the same checkpoint row and deliver each event twice.
Live taps don't need a lock — they hold no state. Per-aggregate work is already serialised by Orleans grain placement.
What you get out of the box
When you call AddPostgresOutbox, AddMySqlOutbox, or AddSqlServerOutbox, the package also registers an ILockProvider keyed to that database:
| Backend | Implementation |
|---|---|
| Postgres | PostgresDistributedLock from Medallion.Threading (pg_try_advisory_lock) |
| MySQL 8 | GET_LOCK |
| SQL Server | sp_getapplock with session scope |
| SQLite | No-op — single-process by definition |
The runners and subscriptions ask the provider for a named lock when they start. If they get a handle they own that responsibility until the handle is disposed or the connection drops; if they don't, they sleep and try again on the next tick.
That part already works. You don't need to wire anything for safety. But unless you turn the lease registry on, the dashboard can't tell you who is holding what — the locks live entirely inside the database engine.
Turning on the lease registry
Add one line after AddNagare:
builder.Services.AddNagare();
builder.Services.AddRelationalLeaseRegistry();That does three things:
- Registers
RelationalLeaseRegistry, backed by a singlenagare_leasestable in the sameDbConnectionyour stores already use. Same row format on Postgres, MySQL, SQL Server, and SQLite. - Wraps your
ILockProviderwithRecordingLockProvider. The wrapper is transparent — it forwards everything to the inner provider — but on each successful acquire it writes a row tonagare_leases, and on dispose it stampsrenewed_at. - Adds a hosted service that creates the table on startup and runs a periodic sweep that deletes rows whose
renewed_atis older than a configurable threshold (default 60 s). This handles the case where a node crashes hard and never gets to release.
There is no separate Nagare.Cluster package. The interfaces live in Nagare.Cluster, the relational implementation in core Nagare. The decoration is a one-line DI call and you can pass bool decorateLockProvider: false if you want to record without intercepting.
Schema
CREATE TABLE IF NOT EXISTS nagare_leases (
lock_name TEXT PRIMARY KEY,
node_id TEXT NOT NULL, -- "<machine>/<pid>"
machine TEXT NOT NULL,
pid INTEGER NOT NULL,
kind TEXT, -- "outbox" | "subscription" | null
acquired_at TEXT NOT NULL, -- ISO-8601, UTC
renewed_at TEXT NOT NULL
);Lock names follow conventions: outbox locks are prefixed nagare-outbox-; subscription and event-route locks are the catchpoint key (book-catalog, event-route-InterLibraryLoan-BookEvent). The default classifier maps the prefix to the kind column. If you have your own scheme, pass a classifier to AddRelationalLeaseRegistry.
What the dashboard shows
/_nagare/cluster lists three things:
- The local node — machine name, PID, uptime. This is whatever you're connected to.
- Active leases — every row in
nagare_leasesfrom any node, with the holder's machine/pid and how recently the row was renewed. The current node's holdings are taggedthis node. - Background runners —
IHostedServices registered on the local node and their state.
In a multi-instance deployment, every node sees the same lease list (because they all read the same table) but only its own runner list. That's the right shape for "is the outbox actually running somewhere, and who?"
Behaviour you should know about
- Renewal on dispose, not delete. Many runners take and release their lock dozens of times a minute (between batches). If we deleted the row on every release, the dashboard would only ever see brief flashes between batches. Instead, dispose stamps
renewed_atand the row stays. The sweep cleans up rows whose holder genuinely went away. - Registry failures never block correctness. Every write the wrapper does to
nagare_leasesis in a try/catch with a warning log. If the table is missing or the connection is broken, the inner lock still works — you just lose visibility. - Acquire failures don't leave rows. If two nodes fight for the same lock, only the winner ever calls
RecordAcquired. The loser getsnullfrom the inner provider and bails before reaching the registry. - No fencing tokens. This is a visibility layer, not a replacement for distributed locks. The lock primitive itself (Postgres advisory locks, etc.) gives you mutual exclusion;
nagare_leasesonly describes who currently holds it.
When you don't need this
Single-instance deployments don't need the lease registry at all — the dashboard's "this node" data is already enough. The registry exists to answer the question "which of my N nodes is the outbox running on right now?" If N is always 1, leave it off.
If you do turn it on for a single instance, the table is harmless: one or two rows, no contention, sweep does nothing.