Observability
Nagare instruments itself with System.Diagnostics.Activity and System.Diagnostics.Metrics — the same primitives ASP.NET Core and EF Core use. Any OpenTelemetry collector picks them up.
There is no Nagare.OpenTelemetry package, and there isn't going to be one. The ActivitySource and Meter are part of the core library so a single .AddSource("Nagare") and .AddMeter("Nagare") is enough — same pattern as Marten, Wolverine, and MassTransit.
Tracing setup
Register Nagare's activity source and meter with your OpenTelemetry configuration:
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Nagare")
.AddOtlpExporter())
.WithMetrics(metrics => metrics
.AddMeter("Nagare")
.AddOtlpExporter());That captures every span and counter the framework emits. The source name is also exposed as NagareActivity.SourceName if you prefer not to hard-code the string.
Traced operations
Nagare creates spans for six operations:
| Span name | Kind | When |
|---|---|---|
nagare.aggregate.ask | internal | A command is sent to an aggregate |
nagare.eventstore.append | producer | Events are written to the store |
nagare.eventstore.read | internal | Events are read from the store |
nagare.subscription.handle | consumer | A subscription processes an event |
nagare.subscription.checkpoint | internal | A subscription saves its position |
nagare.outbox.dispatch | consumer | The outbox runner delivers a side-effect |
A typical command flow produces three nested spans: aggregate.ask wraps eventstore.read (loading the aggregate) and eventstore.append (writing new events). Subscription processing produces subscription.handle with periodic subscription.checkpoint spans. Outbox delivery produces outbox.dispatch per row, linked back to the original eventstore.append via the stored traceparent.
Tags
Each span carries attributes that let you filter and group traces:
| Tag | Value |
|---|---|
nagare.aggregate.id | The aggregate instance ID |
nagare.aggregate.type | The aggregate's type name |
nagare.event.type | The event stream's type name |
nagare.event.count | Number of events in this operation |
nagare.command.type | The command's type name |
nagare.subscription.id | The subscription's identifier |
nagare.position | Global event store position |
nagare.version | Aggregate version number |
nagare.outbox.dispatch_id | Stable id of the outbox row being delivered |
nagare.outbox.target | Logical name of the destination (handler/sink) |
nagare.outbox.attempt | Attempt counter — 1 on first try |
nagare.outbox.outcome | dispatched, transient, or dead |
nagare.outbox.source_process | Process aggregate that produced the dispatch, if any |
nagare.replay | true when the handler was driven by a catch-up replay |
You can find every command handled by a given aggregate, trace from the HTTP request through the command to the events it produced, and follow those events through subscriptions and outbox deliveries.
Metrics
Counters and a histogram are exposed under the same Nagare meter:
| Instrument | Type | Description |
|---|---|---|
nagare.outbox.dispatched | counter | Rows successfully dispatched |
nagare.outbox.dead | counter | Rows that exhausted retries and were marked Dead |
nagare.outbox.attempts | counter | Dispatch attempts, tagged by outcome |
nagare.outbox.dispatch.duration_ms | histogram | Per-row dispatch latency |
nagare.outbox.retention.deleted | counter | Rows removed by the retention sweep |
Tag the dashboards by nagare.outbox.target to break the totals down per sink.
Linking subscription spans to writes
Events written today might be processed by a subscription seconds later — or replayed by a projection rebuild years later. Naive parent-child propagation breaks the second case: the original write trace is long gone from the backend, leaving handler spans pointing at nothing.
By default, every handler span is a new root with an ActivityLink back to the writing span. This matches the OpenTelemetry messaging convention, stays deterministic across replays, and degrades gracefully when the writer trace has been sampled out or aged out of the backend. Backends like Tempo, Jaeger, Datadog, and Honeycomb render the link as a "follows from" edge — click through from the consumer span to find the original write.
The append site emits its span as ActivityKind.Producer; the handler span as ActivityKind.Consumer. The writing process must have an ActivitySource("Nagare") listener active when Append is called — the framework reads Activity.Current at that moment, formats a W3C traceparent, and stores it in event metadata. If no Activity is current, no traceparent is stored and the handler span is just a fresh root with no link.
Optional: parent-child for live-tail debugging
If you want the single-trace UX (request → command → write → handler all in one flame graph), opt in:
new SubscriptionOptions
{
UseParentChildForLiveTail = true,
LiveTailWindow = TimeSpan.FromSeconds(30),
}With this on, events written within LiveTailWindow attach as a child of the writing span; events older than the window still fall back to the link. Mixed-age batches split themselves correctly.
Trade-offs to be aware of before turning it on:
- Determinism is lost. The same event replayed at different wall-clock times produces different trace topology. Snapshot-style trace assertions become brittle.
- Sampling fragility. If the writer trace was head-sampled out, the handler ends up parented to a TraceID that doesn't exist in the backend. Most UIs render that as a malformed root.
- Long-running handlers. A handler that takes several minutes under parent-child keeps the writer trace "open"; some backends (Tempo) flush long-running traces mid-flight.
For most users, the link default is the right call. Turn parent-child on when you specifically want the demo-friendly view and your write traces are reliably retained.
Event metadata
Beyond tracing spans, you can attach metadata to individual events. This metadata is persisted in the event store and available everywhere the event is read.
var metadata = new EventMetadata(
CorrelationId: requestId,
CausationId: $"http:{Request.Path}",
ActorId: currentUser.Id,
Headers: new Dictionary<string, string> { ["tenant"] = currentUser.TenantId },
Timestamp: DateTimeOffset.UtcNow);
await aggregate.Ask(new BorrowBook("user-42"), metadata);In projections, the metadata is available on the envelope:
public async Task Handle(EventEnvelope<BookEvent> envelope)
{
var actorId = envelope.Metadata?.ActorId;
var correlationId = envelope.Metadata?.CorrelationId;
var tenant = envelope.Metadata?.Headers?.GetValueOrDefault("tenant");
// ...
}What to put in metadata
| Field | Purpose | Example |
|---|---|---|
CorrelationId | Trace a chain of events back to the original request | HTTP request ID |
CausationId | Identify what caused this event | The command or event that triggered it |
ActorId | Stable id of whoever asked. Pairs with CommandSource. | user-42 for HTTP, a job name for the scheduler, a migration tag for system tasks. Prefer internal ids over PII — events are immutable. |
CommandType / CommandSource | Auto-attribution | Filled by Aggregate.Ask and ProcessGrain; you usually don't set these. |
Headers | Free-form IReadOnlyDictionary<string, string> for ride-along context | Tenant id, branch id, feature-flag bucket. Surfaces in dashboards and projections; isn't interpreted by the framework. |
Timestamp | Custom timestamp | Override the store's default timestamp |
TraceParent | W3C trace context captured at append time | Auto-populated; do not set by hand |
TraceState | Vendor-specific trace propagation companion to TraceParent | Auto-populated; do not set by hand |
TraceParent and TraceState are written by the framework at append time when an Activity is in scope — you don't set them. They drive the linking decision described above. Custom IEventMetadata types are passed through untouched, so spans-linking only works when you use the built-in EventMetadata record.
A middleware is a good place to attach metadata automatically:
public class CorrelationMiddleware(IHttpContextAccessor http) : ICommandMiddleware
{
public async Task<IReply> InvokeAsync(AskContext context, AskDelegate next)
{
var requestId = http.HttpContext?.TraceIdentifier;
var actorId = http.HttpContext?.User.FindFirst("sub")?.Value;
var enriched = context with
{
Metadata = new EventMetadata(
CorrelationId: requestId,
ActorId: actorId)
};
return await next(enriched);
}
}Register it once and every command carries correlation data.
Health checks
Nagare registers three health checks that report whether the system is ready to serve traffic.
Event store readiness
EventStoreReadyHealthCheck reports healthy once the event store's database table has been created and verified. It reports unhealthy during startup while the initialization service runs CREATE TABLE IF NOT EXISTS.
Subscription readiness
SubscriptionsReadyHealthCheck tracks each subscription individually. It reports healthy only when every registered subscription has completed its initial catch-up (replayed historical events up to the current position). During startup, it lists which subscriptions are still initializing.
This is useful for Kubernetes readiness probes. A service shouldn't receive traffic until its projections have caught up. Otherwise, queries against read models return stale or empty results.
Repository storage readiness
RepositoryStorageReadyHealthCheck reports healthy once all document store tables have been created. Like the event store check, it transitions from unhealthy to healthy during startup.
Using health checks
The health checks are registered automatically when you add event stores, subscriptions, or repository stores. Wire them into ASP.NET Core's health check endpoint:
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = _ => true
});In Kubernetes, point your readiness probe at this endpoint:
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10The service stays out of the load balancer until the event store is initialized, all subscriptions have caught up, and all document store tables exist.
Connecting the pieces
A production observability setup ties these together:
- Tracing shows you what happened: which command was issued, what events it produced, how long each step took
- Metadata shows you why: who issued the command, what request triggered it, what earlier event caused it
- Health checks show you readiness: is the system caught up and safe to serve traffic
The tracing spans and metadata flow into your existing observability stack (Datadog, Jaeger, Grafana Tempo, Azure Monitor). The health checks integrate with your existing orchestrator. Nagare doesn't impose its own monitoring layer. It fits into whatever you already run.