Skip to content

Observability

Nagare instruments itself with System.Diagnostics.Activity and System.Diagnostics.Metrics — the same primitives ASP.NET Core and EF Core use. Any OpenTelemetry collector picks them up.

There is no Nagare.OpenTelemetry package, and there isn't going to be one. The ActivitySource and Meter are part of the core library so a single .AddSource("Nagare") and .AddMeter("Nagare") is enough — same pattern as Marten, Wolverine, and MassTransit.

Tracing setup

Register Nagare's activity source and meter with your OpenTelemetry configuration:

csharp
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Nagare")
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("Nagare")
        .AddOtlpExporter());

That captures every span and counter the framework emits. The source name is also exposed as NagareActivity.SourceName if you prefer not to hard-code the string.

Traced operations

Nagare creates spans for six operations:

Span nameKindWhen
nagare.aggregate.askinternalA command is sent to an aggregate
nagare.eventstore.appendproducerEvents are written to the store
nagare.eventstore.readinternalEvents are read from the store
nagare.subscription.handleconsumerA subscription processes an event
nagare.subscription.checkpointinternalA subscription saves its position
nagare.outbox.dispatchconsumerThe outbox runner delivers a side-effect

A typical command flow produces three nested spans: aggregate.ask wraps eventstore.read (loading the aggregate) and eventstore.append (writing new events). Subscription processing produces subscription.handle with periodic subscription.checkpoint spans. Outbox delivery produces outbox.dispatch per row, linked back to the original eventstore.append via the stored traceparent.

Tags

Each span carries attributes that let you filter and group traces:

TagValue
nagare.aggregate.idThe aggregate instance ID
nagare.aggregate.typeThe aggregate's type name
nagare.event.typeThe event stream's type name
nagare.event.countNumber of events in this operation
nagare.command.typeThe command's type name
nagare.subscription.idThe subscription's identifier
nagare.positionGlobal event store position
nagare.versionAggregate version number
nagare.outbox.dispatch_idStable id of the outbox row being delivered
nagare.outbox.targetLogical name of the destination (handler/sink)
nagare.outbox.attemptAttempt counter — 1 on first try
nagare.outbox.outcomedispatched, transient, or dead
nagare.outbox.source_processProcess aggregate that produced the dispatch, if any
nagare.replaytrue when the handler was driven by a catch-up replay

You can find every command handled by a given aggregate, trace from the HTTP request through the command to the events it produced, and follow those events through subscriptions and outbox deliveries.

Metrics

Counters and a histogram are exposed under the same Nagare meter:

InstrumentTypeDescription
nagare.outbox.dispatchedcounterRows successfully dispatched
nagare.outbox.deadcounterRows that exhausted retries and were marked Dead
nagare.outbox.attemptscounterDispatch attempts, tagged by outcome
nagare.outbox.dispatch.duration_mshistogramPer-row dispatch latency
nagare.outbox.retention.deletedcounterRows removed by the retention sweep

Tag the dashboards by nagare.outbox.target to break the totals down per sink.

Linking subscription spans to writes

Events written today might be processed by a subscription seconds later — or replayed by a projection rebuild years later. Naive parent-child propagation breaks the second case: the original write trace is long gone from the backend, leaving handler spans pointing at nothing.

By default, every handler span is a new root with an ActivityLink back to the writing span. This matches the OpenTelemetry messaging convention, stays deterministic across replays, and degrades gracefully when the writer trace has been sampled out or aged out of the backend. Backends like Tempo, Jaeger, Datadog, and Honeycomb render the link as a "follows from" edge — click through from the consumer span to find the original write.

The append site emits its span as ActivityKind.Producer; the handler span as ActivityKind.Consumer. The writing process must have an ActivitySource("Nagare") listener active when Append is called — the framework reads Activity.Current at that moment, formats a W3C traceparent, and stores it in event metadata. If no Activity is current, no traceparent is stored and the handler span is just a fresh root with no link.

Optional: parent-child for live-tail debugging

If you want the single-trace UX (request → command → write → handler all in one flame graph), opt in:

csharp
new SubscriptionOptions
{
    UseParentChildForLiveTail = true,
    LiveTailWindow = TimeSpan.FromSeconds(30),
}

With this on, events written within LiveTailWindow attach as a child of the writing span; events older than the window still fall back to the link. Mixed-age batches split themselves correctly.

Trade-offs to be aware of before turning it on:

  • Determinism is lost. The same event replayed at different wall-clock times produces different trace topology. Snapshot-style trace assertions become brittle.
  • Sampling fragility. If the writer trace was head-sampled out, the handler ends up parented to a TraceID that doesn't exist in the backend. Most UIs render that as a malformed root.
  • Long-running handlers. A handler that takes several minutes under parent-child keeps the writer trace "open"; some backends (Tempo) flush long-running traces mid-flight.

For most users, the link default is the right call. Turn parent-child on when you specifically want the demo-friendly view and your write traces are reliably retained.

Event metadata

Beyond tracing spans, you can attach metadata to individual events. This metadata is persisted in the event store and available everywhere the event is read.

csharp
var metadata = new EventMetadata(
    CorrelationId: requestId,
    CausationId: $"http:{Request.Path}",
    ActorId: currentUser.Id,
    Headers: new Dictionary<string, string> { ["tenant"] = currentUser.TenantId },
    Timestamp: DateTimeOffset.UtcNow);

await aggregate.Ask(new BorrowBook("user-42"), metadata);

In projections, the metadata is available on the envelope:

csharp
public async Task Handle(EventEnvelope<BookEvent> envelope)
{
    var actorId = envelope.Metadata?.ActorId;
    var correlationId = envelope.Metadata?.CorrelationId;
    var tenant = envelope.Metadata?.Headers?.GetValueOrDefault("tenant");
    // ...
}

What to put in metadata

FieldPurposeExample
CorrelationIdTrace a chain of events back to the original requestHTTP request ID
CausationIdIdentify what caused this eventThe command or event that triggered it
ActorIdStable id of whoever asked. Pairs with CommandSource.user-42 for HTTP, a job name for the scheduler, a migration tag for system tasks. Prefer internal ids over PII — events are immutable.
CommandType / CommandSourceAuto-attributionFilled by Aggregate.Ask and ProcessGrain; you usually don't set these.
HeadersFree-form IReadOnlyDictionary<string, string> for ride-along contextTenant id, branch id, feature-flag bucket. Surfaces in dashboards and projections; isn't interpreted by the framework.
TimestampCustom timestampOverride the store's default timestamp
TraceParentW3C trace context captured at append timeAuto-populated; do not set by hand
TraceStateVendor-specific trace propagation companion to TraceParentAuto-populated; do not set by hand

TraceParent and TraceState are written by the framework at append time when an Activity is in scope — you don't set them. They drive the linking decision described above. Custom IEventMetadata types are passed through untouched, so spans-linking only works when you use the built-in EventMetadata record.

A middleware is a good place to attach metadata automatically:

csharp
public class CorrelationMiddleware(IHttpContextAccessor http) : ICommandMiddleware
{
    public async Task<IReply> InvokeAsync(AskContext context, AskDelegate next)
    {
        var requestId = http.HttpContext?.TraceIdentifier;
        var actorId = http.HttpContext?.User.FindFirst("sub")?.Value;

        var enriched = context with
        {
            Metadata = new EventMetadata(
                CorrelationId: requestId,
                ActorId: actorId)
        };

        return await next(enriched);
    }
}

Register it once and every command carries correlation data.

Health checks

Nagare registers three health checks that report whether the system is ready to serve traffic.

Event store readiness

EventStoreReadyHealthCheck reports healthy once the event store's database table has been created and verified. It reports unhealthy during startup while the initialization service runs CREATE TABLE IF NOT EXISTS.

Subscription readiness

SubscriptionsReadyHealthCheck tracks each subscription individually. It reports healthy only when every registered subscription has completed its initial catch-up (replayed historical events up to the current position). During startup, it lists which subscriptions are still initializing.

This is useful for Kubernetes readiness probes. A service shouldn't receive traffic until its projections have caught up. Otherwise, queries against read models return stale or empty results.

Repository storage readiness

RepositoryStorageReadyHealthCheck reports healthy once all document store tables have been created. Like the event store check, it transitions from unhealthy to healthy during startup.

Using health checks

The health checks are registered automatically when you add event stores, subscriptions, or repository stores. Wire them into ASP.NET Core's health check endpoint:

csharp
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = _ => true
});

In Kubernetes, point your readiness probe at this endpoint:

yaml
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

The service stays out of the load balancer until the event store is initialized, all subscriptions have caught up, and all document store tables exist.

Connecting the pieces

A production observability setup ties these together:

  1. Tracing shows you what happened: which command was issued, what events it produced, how long each step took
  2. Metadata shows you why: who issued the command, what request triggered it, what earlier event caused it
  3. Health checks show you readiness: is the system caught up and safe to serve traffic

The tracing spans and metadata flow into your existing observability stack (Datadog, Jaeger, Grafana Tempo, Azure Monitor). The health checks integrate with your existing orchestrator. Nagare doesn't impose its own monitoring layer. It fits into whatever you already run.

流れ — flow.