Observability

Nagare instruments itself with System.Diagnostics.Activity and System.Diagnostics.Metrics — the same primitives ASP.NET Core and EF Core use. Any OpenTelemetry collector picks them up.

There is no Nagare.OpenTelemetry package, and there isn't going to be one. The ActivitySource and Meter are part of the core library so a single .AddSource("Nagare") and .AddMeter("Nagare") is enough — same pattern as Marten, Wolverine, and MassTransit.

Tracing setup

csharp

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Nagare")
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("Nagare")
        .AddOtlpExporter());

That captures every span and counter the framework emits. The source name is also exposed as NagareActivity.SourceName if you prefer not to hard-code the string.

Traced operations

Nagare creates spans for six operations:

Span name	Kind	When
`nagare.aggregate.ask`	internal	A command is sent to an aggregate
`nagare.eventstore.append`	producer	Events are written to the store
`nagare.eventstore.read`	internal	Events are read from the store
`nagare.subscription.handle`	consumer	A subscription processes an event
`nagare.subscription.checkpoint`	internal	A subscription saves its position
`nagare.outbox.dispatch`	consumer	The outbox runner delivers a side-effect

A typical command flow produces three nested spans: aggregate.ask wraps eventstore.read (loading the aggregate) and eventstore.append (writing new events). Subscription processing produces subscription.handle with periodic subscription.checkpoint spans. Outbox delivery produces outbox.dispatch per row, linked back to the original eventstore.append via the stored traceparent.

Tag	Value
`nagare.aggregate.id`	The aggregate instance ID
`nagare.aggregate.type`	The aggregate's type name
`nagare.event.type`	The event stream's type name
`nagare.event.count`	Number of events in this operation
`nagare.command.type`	The command's type name
`nagare.subscription.id`	The subscription's identifier
`nagare.position`	Global event store position
`nagare.version`	Aggregate version number
`nagare.outbox.dispatch_id`	Stable id of the outbox row being delivered
`nagare.outbox.target`	Logical name of the destination (handler/sink)
`nagare.outbox.attempt`	Attempt counter — `1` on first try
`nagare.outbox.outcome`	`dispatched`, `transient`, or `dead`
`nagare.outbox.source_process`	Process aggregate that produced the dispatch, if any
`nagare.replay`	`true` when the handler was driven by a catch-up replay

Metrics

Counters and a histogram are exposed under the same Nagare meter:

Instrument	Type	Description
`nagare.outbox.dispatched`	counter	Rows successfully dispatched
`nagare.outbox.dead`	counter	Rows that exhausted retries and were marked `Dead`
`nagare.outbox.attempts`	counter	Dispatch attempts, tagged by outcome
`nagare.outbox.dispatch.duration_ms`	histogram	Per-row dispatch latency
`nagare.outbox.retention.deleted`	counter	Rows removed by the retention sweep

Tag the dashboards by nagare.outbox.target to break the totals down per sink.

Linking subscription spans to writes

Events written today might be processed by a subscription seconds later — or replayed by a projection rebuild years later. Naive parent-child propagation breaks the second case: the original write trace is long gone from the backend, leaving handler spans pointing at nothing.

By default, every handler span is a new root with an ActivityLink back to the writing span. This matches the OpenTelemetry messaging convention, stays deterministic across replays, and degrades gracefully when the writer trace has been sampled out or aged out of the backend. Backends like Tempo, Jaeger, Datadog, and Honeycomb render the link as a "follows from" edge — click through from the consumer span to find the original write.

The append site emits its span as ActivityKind.Producer; the handler span as ActivityKind.Consumer. The writing process must have an ActivitySource("Nagare") listener active when Append is called — the framework reads Activity.Current at that moment, formats a W3C traceparent, and stores it in event metadata. If no Activity is current, no traceparent is stored and the handler span is just a fresh root with no link.

Optional: parent-child for live-tail debugging

If you want the single-trace UX (request → command → write → handler all in one flame graph), opt in:

csharp

new SubscriptionOptions
{
    UseParentChildForLiveTail = true,
    LiveTailWindow = TimeSpan.FromSeconds(30),
}

With this on, events written within LiveTailWindow attach as a child of the writing span; events older than the window still fall back to the link. Mixed-age batches split themselves correctly.

Trade-offs to be aware of before turning it on:

Determinism is lost. The same event replayed at different wall-clock times produces different trace topology. Snapshot-style trace assertions become brittle.
Sampling fragility. If the writer trace was head-sampled out, the handler ends up parented to a TraceID that doesn't exist in the backend. Most UIs render that as a malformed root.
Long-running handlers. A handler that takes several minutes under parent-child keeps the writer trace "open"; some backends (Tempo) flush long-running traces mid-flight.

For most users, the link default is the right call. Turn parent-child on when you specifically want the demo-friendly view and your write traces are reliably retained.

Event metadata

Beyond tracing spans, you can attach metadata to individual events. This metadata is persisted in the event store and available everywhere the event is read.

csharp

var metadata = new EventMetadata(
    CorrelationId: requestId,
    CausationId: $"http:{Request.Path}",
    ActorId: currentUser.Id,
    Headers: new Dictionary<string, string> { ["tenant"] = currentUser.TenantId },
    Timestamp: DateTimeOffset.UtcNow);

await aggregate.Ask(new BorrowBook("user-42"), metadata);

In projections, the metadata is available on the envelope:

csharp

public async Task Handle(EventEnvelope<BookEvent> envelope)
{
    var actorId = envelope.Metadata?.ActorId;
    var correlationId = envelope.Metadata?.CorrelationId;
    var tenant = envelope.Metadata?.Headers?.GetValueOrDefault("tenant");
    // ...
}

What to put in metadata

Field	Purpose	Example
`CorrelationId`	Trace a chain of events back to the original request	HTTP request ID
`CausationId`	Identify what caused this event	The command or event that triggered it
`ActorId`	Stable id of whoever asked. Pairs with `CommandSource`.	`user-42` for HTTP, a job name for the scheduler, a migration tag for system tasks. Prefer internal ids over PII — events are immutable.
`CommandType` / `CommandSource`	Auto-attribution	Filled by `Aggregate.Ask` and `ProcessGrain`; you usually don't set these.
`Headers`	Free-form `IReadOnlyDictionary<string, string>` for ride-along context	Tenant id, branch id, feature-flag bucket. Surfaces in dashboards and projections; isn't interpreted by the framework.
`Timestamp`	Custom timestamp	Override the store's default timestamp
`TraceParent`	W3C trace context captured at append time	Auto-populated; do not set by hand
`TraceState`	Vendor-specific trace propagation companion to TraceParent	Auto-populated; do not set by hand

TraceParent and TraceState are written by the framework at append time when an Activity is in scope — you don't set them. They drive the linking decision described above. Custom IEventMetadata types are passed through untouched, so spans-linking only works when you use the built-in EventMetadata record.

A middleware is a good place to attach metadata automatically:

csharp

public class CorrelationMiddleware(IHttpContextAccessor http) : ICommandMiddleware
{
    public async Task<IReply> InvokeAsync(AskContext context, AskDelegate next)
    {
        var requestId = http.HttpContext?.TraceIdentifier;
        var actorId = http.HttpContext?.User.FindFirst("sub")?.Value;

        var enriched = context with
        {
            Metadata = new EventMetadata(
                CorrelationId: requestId,
                ActorId: actorId)
        };

        return await next(enriched);
    }
}

Health checks

Nagare registers three health checks that report whether the system is ready to serve traffic.

Event store readiness

EventStoreReadyHealthCheck reports healthy once the event store's database table has been created and verified. It reports unhealthy during startup while the initialization service runs CREATE TABLE IF NOT EXISTS.

Subscription readiness

SubscriptionsReadyHealthCheck tracks each subscription individually. It reports healthy only when every registered subscription has completed its initial catch-up (replayed historical events up to the current position). During startup, it lists which subscriptions are still initializing.

This is useful for Kubernetes readiness probes. A service shouldn't receive traffic until its projections have caught up. Otherwise, queries against read models return stale or empty results.

Repository storage readiness

RepositoryStorageReadyHealthCheck reports healthy once all document store tables have been created. Like the event store check, it transitions from unhealthy to healthy during startup.

Using health checks

The health checks are registered automatically when you add event stores, subscriptions, or repository stores. Wire them into ASP.NET Core's health check endpoint:

csharp

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = _ => true
});

In Kubernetes, point your readiness probe at this endpoint:

yaml

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

The service stays out of the load balancer until the event store is initialized, all subscriptions have caught up, and all document store tables exist.

Connecting the pieces

A production observability setup ties these together:

Tracing shows you what happened: which command was issued, what events it produced, how long each step took
Metadata shows you why: who issued the command, what request triggered it, what earlier event caused it
Health checks show you readiness: is the system caught up and safe to serve traffic

The tracing spans and metadata flow into your existing observability stack (Datadog, Jaeger, Grafana Tempo, Azure Monitor). The health checks integrate with your existing orchestrator. Nagare doesn't impose its own monitoring layer. It fits into whatever you already run.

Observability ​

Tracing setup ​

Traced operations ​

Tags ​

Metrics ​

Linking subscription spans to writes ​

Optional: parent-child for live-tail debugging ​

Event metadata ​

What to put in metadata ​

Health checks ​

Event store readiness ​

Subscription readiness ​

Repository storage readiness ​

Using health checks ​

Connecting the pieces ​

Observability

Tracing setup

Traced operations

Tags

Metrics

Linking subscription spans to writes

Optional: parent-child for live-tail debugging

Event metadata

What to put in metadata

Health checks

Event store readiness

Subscription readiness

Repository storage readiness

Using health checks

Connecting the pieces