Skip to content

Cluster

Nagare runs fine on a single instance, but most production deployments will scale out. This page explains the two pieces that make multi-instance deployments work safely:

  1. Distributed locks stop two instances from running the same outbox runner or singleton subscription at the same time.
  2. The lease registry is a small visibility table the dashboard reads so you can see which node holds which lock.

The locks are part of the storage backend you already configured. The lease registry is opt-in — turn it on and the Cluster page in the dashboard fills in.

Why locks at all

A few things in Nagare are meant to run as exactly one instance:

  • The outbox runner. If two runners drain the same nagare_outbox table they will fight over rows, double-dispatch, and burn retry budget.
  • Singleton-flavoured subscriptions — the catch-up path of a Subscription that owns a checkpoint, and EventRouteSubscription for process managers. Two of either running side-by-side would step on the same checkpoint row and deliver each event twice.

Live taps don't need a lock — they hold no state. Per-aggregate work is already serialised by Orleans grain placement.

What you get out of the box

When you call AddPostgresOutbox, AddMySqlOutbox, or AddSqlServerOutbox, the package also registers an ILockProvider keyed to that database:

BackendImplementation
PostgresPostgresDistributedLock from Medallion.Threading (pg_try_advisory_lock)
MySQL 8GET_LOCK
SQL Serversp_getapplock with session scope
SQLiteNo-op — single-process by definition

The runners and subscriptions ask the provider for a named lock when they start. If they get a handle they own that responsibility until the handle is disposed or the connection drops; if they don't, they sleep and try again on the next tick.

That part already works. You don't need to wire anything for safety. But unless you turn the lease registry on, the dashboard can't tell you who is holding what — the locks live entirely inside the database engine.

Turning on the lease registry

Add one line after AddNagare:

csharp
builder.Services.AddNagare();
builder.Services.AddRelationalLeaseRegistry();

That does three things:

  1. Registers RelationalLeaseRegistry, backed by a single nagare_leases table in the same DbConnection your stores already use. Same row format on Postgres, MySQL, SQL Server, and SQLite.
  2. Wraps your ILockProvider with RecordingLockProvider. The wrapper is transparent — it forwards everything to the inner provider — but on each successful acquire it writes a row to nagare_leases, and on dispose it stamps renewed_at.
  3. Adds a hosted service that creates the table on startup and runs a periodic sweep that deletes rows whose renewed_at is older than a configurable threshold (default 60 s). This handles the case where a node crashes hard and never gets to release.

There is no separate Nagare.Cluster package. The interfaces live in Nagare.Cluster, the relational implementation in core Nagare. The decoration is a one-line DI call and you can pass bool decorateLockProvider: false if you want to record without intercepting.

Schema

sql
CREATE TABLE IF NOT EXISTS nagare_leases (
    lock_name   TEXT PRIMARY KEY,
    node_id     TEXT NOT NULL,    -- "<machine>/<pid>"
    machine     TEXT NOT NULL,
    pid         INTEGER NOT NULL,
    kind        TEXT,             -- "outbox" | "subscription" | null
    acquired_at TEXT NOT NULL,    -- ISO-8601, UTC
    renewed_at  TEXT NOT NULL
);

Lock names follow conventions: outbox locks are prefixed nagare-outbox-; subscription and event-route locks are the catchpoint key (book-catalog, event-route-InterLibraryLoan-BookEvent). The default classifier maps the prefix to the kind column. If you have your own scheme, pass a classifier to AddRelationalLeaseRegistry.

What the dashboard shows

/_nagare/cluster lists three things:

  • The local node — machine name, PID, uptime. This is whatever you're connected to.
  • Active leases — every row in nagare_leases from any node, with the holder's machine/pid and how recently the row was renewed. The current node's holdings are tagged this node.
  • Background runnersIHostedServices registered on the local node and their state.

In a multi-instance deployment, every node sees the same lease list (because they all read the same table) but only its own runner list. That's the right shape for "is the outbox actually running somewhere, and who?"

Behaviour you should know about

  • Renewal on dispose, not delete. Many runners take and release their lock dozens of times a minute (between batches). If we deleted the row on every release, the dashboard would only ever see brief flashes between batches. Instead, dispose stamps renewed_at and the row stays. The sweep cleans up rows whose holder genuinely went away.
  • Registry failures never block correctness. Every write the wrapper does to nagare_leases is in a try/catch with a warning log. If the table is missing or the connection is broken, the inner lock still works — you just lose visibility.
  • Acquire failures don't leave rows. If two nodes fight for the same lock, only the winner ever calls RecordAcquired. The loser gets null from the inner provider and bails before reaching the registry.
  • No fencing tokens. This is a visibility layer, not a replacement for distributed locks. The lock primitive itself (Postgres advisory locks, etc.) gives you mutual exclusion; nagare_leases only describes who currently holds it.

When you don't need this

Single-instance deployments don't need the lease registry at all — the dashboard's "this node" data is already enough. The registry exists to answer the question "which of my N nodes is the outbox running on right now?" If N is always 1, leave it off.

If you do turn it on for a single instance, the table is harmless: one or two rows, no contention, sweep does nothing.

流れ — flow.