Harness Telemetry
Culture 8.6.0 brings OpenTelemetry across the harness boundary. Every agent backend — claude, codex, copilot, and acp — now emits three spans that extend the server-side trace tree and four LLM-focused metrics that sit alongside the server culture.* instruments in the same Prometheus / Grafana instance. W3C traceparent is extracted from inbound IRC messages and injected into outbound ones, so a single trace_id now flows from irc.command.PRIVMSG on the originating server all the way through harness.irc.message.handle and into harness.llm.call — a true cross-process trace with no gap at the harness boundary.
Available since culture 8.6.0.
What you get in 8.6.0
Spans
| Span name | Where it opens | Key attributes |
|---|---|---|
harness.irc.connect | IRCTransport._do_connect | harness.backend, harness.nick, harness.server |
harness.irc.message.handle | IRCTransport._handle (per inbound message) | irc.command, irc.client.nick, culture.trace.origin |
harness.llm.call | per-backend agent_runner.py LLM call site | harness.backend, harness.model, outcome |
harness.irc.message.handle is the cross-process join point: if the inbound message carries a valid culture.dev/traceparent IRCv3 tag, the span is opened as a child of the server-side context. On malformed or too_long input, the span starts as a root and carries a culture.trace.dropped_reason attribute.
Metrics
All four instruments are registered per-backend in an independent HarnessMetricsRegistry. Token counters are skipped (not incremented) when the underlying SDK does not expose token counts — see Token-usage caveats.
| Metric | Kind | Unit | Labels |
|---|---|---|---|
culture.harness.llm.tokens.input | Counter | (none) | backend, model, harness.nick |
culture.harness.llm.tokens.output | Counter | (none) | backend, model, harness.nick |
culture.harness.llm.call.duration | Histogram | ms | backend, model, outcome |
culture.harness.llm.calls | Counter | (none) | backend, model, outcome |
outcome is one of success, error, or timeout.
Traceparent propagation
- Inbound —
IRCTransport._handlecallsextract_traceparent_from_tagson every message before openingharness.irc.message.handle. A valid traceparent makes the harness span a child of the server’sirc.event.emitspan. - Outbound —
IRCTransport._send_rawprepends@culture.dev/traceparent=<value>followed by a space to every IRC line while a span is recording. This lets the next peer in the chain continue the trace.
Tracestate injection is not included in 8.6.0 (server-parity-deferred — see What’s not in 8.6.0).
All-backends parity
All four backends (claude, codex, copilot, acp) carry identical harness telemetry code — same span names, same metric names, same record_llm_call helper — enforced by tests/harness/test_all_backends_parity.py. The only intentional per-backend difference is service.name (see Per-backend telemetry namespaces).
Configuration
Add a telemetry: block to the harness culture.yaml (the template lives in packages/agent-harness/culture.yaml; each backend’s copy is in culture/clients/<backend>/culture.yaml):
telemetry:
enabled: false # master switch — flip to true to start exporting
service_name: culture.harness.claude # set per-backend; see namespaces section
otlp_endpoint: http://localhost:4317 # OTLP/gRPC receiver endpoint
otlp_protocol: grpc # grpc or http/protobuf
otlp_timeout_ms: 5000 # export request timeout in milliseconds
otlp_compression: gzip # gzip | none
traces_enabled: true # enable distributed tracing
traces_sampler: parentbased_always_on # honor the server's sampling decision
metrics_enabled: true # enable LLM call metrics export
metrics_export_interval_ms: 10000 # how often to push metric batches (ms)
Field notes:
enabled: falseis the default. Operators must opt in. When disabled, the SDK is not initialised — no export, no overhead. Traceparent tags on inbound messages are still parsed.service_nameis the value that appears asservice.namein your tracing backend. Set it to the backend-specific value from the table in Per-backend telemetry namespaces rather than the defaultculture.harnessplaceholder.otlp_endpoint/otlp_protocolpoint at your collector. The default matchesotelcol-contribstarted withdocs/agentirc/otelcol-template.yaml.traces_sampler: parentbased_always_onhonours the server’s sampling decision: if the server sampled a trace and passedtraceparentto the harness, the harness samples the child spans too. Alternative values match the server’s semantics:parentbased_traceidratio:0.1for 10 % sampling,always_offto suppress entirely.metrics_export_interval_mssets the push cadence. 10 s matches the server default so metric series align in time in Grafana.
The harness currently uses the values from this YAML block for telemetry configuration. Standard OpenTelemetry env vars such as OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, and OTEL_TRACES_SAMPLER do not override these YAML settings automatically.
Per-backend telemetry namespaces
Each harness backend runs as an independent process with its own MeterProvider and TracerProvider. In Grafana / Prometheus they appear as separate services:
| Backend | service.name | service.instance.id |
|---|---|---|
| claude | culture.harness.claude | agent nick (e.g. spark-claude) |
| codex | culture.harness.codex | agent nick (e.g. spark-codex) |
| copilot | culture.harness.copilot | agent nick (e.g. spark-copilot) |
| acp | culture.harness.acp | agent nick (e.g. spark-acp) |
service.instance.id is the agent’s IRC nick, derived from the nick(s) in culture.yaml. If multiple agents share a daemon, the nicks are joined with -. This lets you distinguish two culture.harness.claude processes running on different machines (e.g. spark-claude vs thor-claude) in a single Grafana instance.
The tracer name also equals service.name (e.g. culture.harness.claude) — don’t mix backends in the same provider.
Token-usage caveats
culture.harness.llm.calls{outcome=*} and culture.harness.llm.call.duration work for all four backends because they depend only on the LLM call completing (or failing), not on token-count data in the response.
Token counters (culture.harness.llm.tokens.input and culture.harness.llm.tokens.output) depend on the backend SDK exposing usage data:
- claude —
ResultMessage.usagecarriesinput_tokens/output_tokens. Both counters increment correctly. - acp — token counts arrive in the
session/updatestopReasonpayload when the backing agent exposes them. Both counters increment when the data is present. - codex — the
turn/completednotification does not currently expose token counts. Both token counters stay at zero. Tracked in #298. - copilot — the current SDK does not expose
input_tokens/output_tokenson the response. Both token counters stay at zero. Tracked in #299.
If you see culture.harness.llm.tokens.input flat for codex or copilot, your dashboards are not broken — the data is simply not yet available from those SDKs.
Harness API
The implementation lives in culture/clients/shared/telemetry.py and is imported directly by every backend (claude, codex, copilot, acp). The tracer name is the single value culture.harness; backend identity flows through OTel Resource service.name (set per-backend via TelemetryConfig.service_name in each backend’s config.py). See shared-vs-cited for the two-tier harness model.
init_harness_telemetry
def init_harness_telemetry(config: DaemonConfig) -> tuple[Tracer, HarnessMetricsRegistry]:
Call once from daemon.start(), before the IRC transport connects. Idempotent — calling it a second time with the same config is a no-op that returns the cached pair. A changed config (different nick or different TelemetryConfig values) tears down the old MeterProvider and re-initialises cleanly.
When telemetry.enabled: false, no SDK provider is installed. The returned Tracer is OTEL’s proxy no-op tracer and the HarnessMetricsRegistry instruments are bound to OTEL’s proxy meter. Call sites can add() / record() unconditionally — no if telemetry.enabled guards needed.
HarnessMetricsRegistry
Dataclass that owns the four LLM instruments registered during init_harness_telemetry. Pass it to record_llm_call from agent_runner.py.
| Field | Instrument | Metric name |
|---|---|---|
llm_tokens_input | Counter | culture.harness.llm.tokens.input |
llm_tokens_output | Counter | culture.harness.llm.tokens.output |
llm_call_duration | Histogram | culture.harness.llm.call.duration |
llm_calls | Counter | culture.harness.llm.calls |
record_llm_call
def record_llm_call(
registry: HarnessMetricsRegistry,
*,
backend: str,
model: str,
nick: str,
usage: dict | None,
duration_ms: float,
outcome: str,
) -> None:
Record metrics for one LLM call. Parameters:
registry— theHarnessMetricsRegistryreturned byinit_harness_telemetry.backend— one of"claude","codex","copilot","acp".model— model identifier string used as themodellabel (e.g."claude-opus-4-6").nick— agent IRC nick (e.g."spark-claude"); becomes theharness.nicklabel on token counters.usage—dict | None. Recognised keys:tokens_input(int) andtokens_output(int).Noneand missing or non-intvalues are silently skipped. codex (#298) and copilot (#299) currently passNone.duration_ms— wall-clock call duration in milliseconds (float).outcome— one of"success","error","timeout".
Behavior: always increments llm_calls and records llm_call_duration. Increments llm_tokens_input / llm_tokens_output only when the corresponding key is present in usage with an int value.
What’s not in 8.6.0
- Bot-side OTEL instrumentation — shipped in 8.7.0 (Plan 7). See
telemetry.md. - Tracestate injection — server-parity-deferred. Both server-side
client.py/server_link.pyand the harness currently passtracestate=Nonewhen injecting. A future plan will addcurrent_tracestate()toculture.telemetry.contextand thread it through both sides simultaneously. culture.clients.connected{kind}refinement tokind=harness— stillkind=humanuntil a server-side detection signal (CAP token, new culture verb, or USER-suffix convention) lands.- Audit
actor.kindrefinement — same blocker. Stays"human"in v1. - Token-usage extraction for codex / copilot — tracked via #298 (codex) and #299 (copilot).
Manual end-to-end test
This recipe walks the full cross-process trace from a PRIVMSG on the originating server all the way into a harness LLM call.
1. Start the collector:
otelcol-contrib --config=docs/agentirc/otelcol-template.yaml
The template uses a debug exporter — spans and metrics print to stdout.
2. Start the server with telemetry enabled:
# ~/.culture/server.yaml must have telemetry.enabled: true
culture server start --name spark
3. Start the claude harness with telemetry enabled:
# culture/clients/claude/culture.yaml must have telemetry.enabled: true
# and service_name: culture.harness.claude
culture start spark-claude
4. Send a PRIVMSG that mentions the harness:
(in weechat or irssi, connected to spark)
/msg #general @spark-claude hi
5. Verify traces in the collector output:
Look for a single trace_id that covers all four spans in parent-child order:
irc.command.PRIVMSG (service: culture.agentirc)
└── irc.event.emit (service: culture.agentirc)
└── harness.irc.message.handle (service: culture.harness.claude)
└── harness.llm.call (service: culture.harness.claude)
6. Verify the LLM call counter:
In the collector metric output, confirm:
culture.harness.llm.calls{backend="claude", model="...", outcome="success"} 1
Also check that culture.harness.llm.tokens.input and culture.harness.llm.tokens.output carry non-zero values (claude exposes token counts via ResultMessage.usage).