Coverage baseline
This doc tracks culture’s coverage floor and the per-domain growth path through Phase 0a (cultureagent extraction).
Baseline (locked 2026-05-09)
- Project-wide: 56.86% (7556/13289 lines)
- Integration-only baseline (
tests/test_integration_layer5.py): 15.95% - Harness-module-only: 56% of 5304 lines
The baseline was measured by uv run pytest -n auto --cov=culture --cov-report=xml:coverage.xml against commit 4595fc7. The full per-behavior audit (mapping each harness unit test file to its production-code coverage delta) lands in the follow-up audit-doc PR; this baseline doc is intentionally self-contained so the rationale for fail_under = 56 is on main immediately.
Locks
- pytest CI gate:
[tool.coverage.report] fail_under = 56inpyproject.toml. Drops below 56% fail the PR locally / in CI before SonarCloud reports back. - SonarCloud Quality Gate:
sonar.qualitygate.wait=trueinsonar-project.propertiesblocks CI on the gate decision. Gate threshold managed in SonarCloud’s project settings (out-of-tree). - CI scanner:
.github/workflows/tests.ymlrunsSonarSource/sonarqube-scan-actionafter pytest, uploadingcoverage.xmlto the SonarCloud project (agentculture_culture). Fork PRs withoutSONAR_TOKENskip the scan cleanly.
Per-domain growth path (Phase 0a)
| Domain | Baseline | Phase 0a (projected) | Phase 0a (measured) | Gating done by |
|---|---|---|---|---|
culture/clients/shared/ | ~58% | ~95% | 80% | Tasks 2, 3, 4, 5, 6 |
culture/clients/claude/ | ~85% | ~90% | 72% | Tasks 6, 8 |
culture/clients/codex/ | ~25% | ~75% | ~43% | (Task 8 narrowed to claude) |
culture/clients/copilot/ | ~25% | ~75% | ~43% | (Task 8 narrowed to claude) |
culture/clients/acp/ | ~25% | ~70% | ~43% | (Task 8 narrowed to claude) |
| Project-wide | 56.86% | ~73% | 56.96% | (see below) |
The project-wide number barely moved because the harness unit tests in tests/harness/test_*.py still cover the same code paths the new integration tests exercise — once Phase 1 deletes those unit tests, the integration tests become the sole coverage source and the project-wide number reflects what’s actually testable through the daemon → IRC → harness chain. Phase 0a’s real delivery is the +22pp on culture/clients/shared/ (58% → 80%): the shared-by-import tier that all four backends use is now covered through real agentirc.IRCd instances, not just through harness unit tests.
The non-claude backends (codex, copilot, acp) didn’t move because Task 7 (supervisor) was dropped during the audit revision and Task 8 (agent_runner timeout) narrowed to claude-only — each non-claude backend has a distinct SDK injection point that warrants its own integration-test PR. Cross-backend integration coverage is a follow-up.
Why “Coverage on New Code” alone isn’t enough here
SonarCloud’s default gate is “Coverage on New Code ≥ 80%” — protects new lines from being added without tests. That stays on as a regression preventer, but it doesn’t lock today’s overall floor: a delete that removes tested code can raise overall percentage while leaving real coverage gaps. The pytest fail_under = 56 is the tighter floor.
Closeout — Phase 0a complete (2026-05-09)
Phase 0a’s six integration-test PRs all merged:
- #364 — Task 2 — attention behaviors
- #365 — Task 3 — message buffer overflow
- #366 — Task 4 — IRC transport traceparent propagation
- #367 — Task 5 — webhook HTTP fanout
- #368 — Task 6 — harness telemetry (connect span + transition counter)
- #369 — Task 8 — claude agent_runner timeout
fail_under stays at 56 (matching floor(56.96)); see the comment in pyproject.toml. SonarCloud gate choice: Path B (in-tree only — no Sonar UI changes; rely on the in-tree pytest floor for project-wide and on Sonar’s default “Coverage on New Code ≥ 80%” for regression prevention).
Known follow-ups (each will be its own small PR, not part of Phase 0a):
- Task 8.5 — skill_client integration test.
culture/clients/claude/skill/irc_client.pyshows 54% under integration-only tests; needs a smalltests/test_integration_skill_client.pybefore Phase 1 deletestests/test_skill_client.py, otherwise the Phase 1 delete drops the floor. - Cross-backend integration timeout tests for
codex,copilot,acp(Task 8 narrowing). - Product fix —
AgentDaemon.stop()should cancel/await_background_tasks(PR #369 review #2 follow-up).
Coverage ratchet to 90% — phase log (started 2026-05-13)
Plan: /home/spark/.claude/plans/we-re-at-60-coverage-refactored-lark.md — phased PR plan to raise the floor from 56 to 90.
Phase 1 — Coverage plumbing + console removal (2026-05-13)
Measured: 60.99% (6260 statements, 2442 missing) → fail_under = 60.
Changes:
- Parallel-coverage merging fixed.
[tool.coverage.run]now setsparallel = true,concurrency = ["thread", "multiprocessing"],sigterm = true. The test runner (.claude/skills/run-tests/scripts/test.sh) now runscoverage combinebetween pytest-xdist and the final report. Before this, xdist worker.coverage.*files were merged unsafely and the reported number drifted from reality. culture/console/deleted. The Textual TUI is superseded by sibling repoirc-lens;culture consolealready passthrough-launches it viaculture/cli/console.py(kept). Removed:culture/console/package, eighttests/test_console_*.pyfiles,textual>=1.0frompyproject.toml.exclude_linesextended withraise NotImplementedError,...(literal ellipsis), andif sys.platform == "win32":to drop legitimately-untestable lines from the denominator.
Notable findings (deferred to later phases):
culture/transport/client.py(569 stmts, 25% covered) is largely dead post-extraction.agentirc.ircd.IRCduses agentirc’s ownClientclass (.venv/lib/python3.12/site-packages/agentirc/client.py), not culture’s. Only four telemetry tests still importculture.transport.client.Clientdirectly. Phase 5 should reconsider: delete the unused code rather than backfill tests for it.culture/protocol/commands.py(35 stmts, 0% covered) — addressed in Phase 2 as planned.
Phase 2 — cli/shared/process.py + protocol/commands.py (2026-05-13)
Measured: 62.91% (6260 statements, 2322 missing) → fail_under = 62. Both target files moved from low/zero coverage to 100%:
culture/cli/shared/process.py— 12% → 100% (97 statements, 0 missing).tests/test_cli_shared_process.py(24 tests) covers all four functions (stop_agent,_try_ipc_shutdown,_try_pid_shutdown,server_stop_by_name) including every branch: IPC success, IPC failure, IPC raises, missing socket, corrupt/stale/non-culture PID, SIGTERM success, SIGTERMProcessLookupError, SIGKILL escalation, and the “aborts kill when PID no longer culture” guard. All OS primitives monkeypatched — no realos.kill/os.fork/ sockets.culture/protocol/commands.py— 0% → 100% (35 statements, 0 missing).tests/test_protocol_commands.py(3 tests) discovery-style: iteratesvars(commands)for uppercase string constants and assertsvalue == name. Adding a new verb does not require a test edit; deletion is caught by the RFC-baseline smoke test.
Phase 3a — cli/shared/mesh.py + cli/bot.py + cli/channel.py (2026-05-13)
Measured: 67.89% (6260 statements, 2010 missing) → fail_under = 67. All three target files moved to near-100% coverage:
culture/cli/shared/mesh.py— 25% → 100% (48 statements, 0 missing).tests/test_cli_shared_mesh.py(14 tests) covers all four functions (parse_link,resolve_links_from_mesh,generate_mesh_from_agents,build_server_start_cmd) including malformed-spec rejection, password-with-colons preservation, peer-without-credential skip with warning, and the missing-DEFAULT_CONFIG fallback path.culture/cli/bot.py— 33% → 99% (199 statements, 2 missing).tests/test_cli_bot.py(35 tests) covers all seven_bot_*handlers + the_should_include_bot/_load_and_filter_botshelpers. Filesystem isolated via tmp_path-backedBOTS_DIR;load_config_or_defaultpatched. The 2 missing lines are in a small_bot_listempty-state branch.culture/cli/channel.py— 44% → 99.6% (260 statements, 1 missing). 21 new tests appended totests/test_channel_cli.pycovering every_cmd_*handler (list/read/message/who/join/part/ask/topic/compact/clear),_interpret_escapes,_is_connection_error, the dispatch wrapper for OSError, and the_topic_readdaemon-unreachable exit path. IPC is stubbed at the_try_ipc/_require_ipcboundary (the_cmd_*handlers callasyncio.runinternally, so a test-owned event loop would dead-lock the real Unix-socket-mock pattern).
Phase 3b — cli/mesh.py + cli/server.py (2026-05-13)
Measured: 73.40% (6260 statements, 1665 missing) → fail_under = 73. Two more CLI handlers covered:
culture/cli/mesh.py— 53% → 99.3% (282 statements, 2 missing).tests/test_cli_mesh.py(54 tests) covers every dispatched handler (_cmd_overview,_cmd_setup,_cmd_update,_cmd_console) plus every reachable helper:_collect_mesh_dataexit paths,_find_upgrade_tool,_upgrade_timeout_hint,_run_upgrade(timeout + non-zero-exit),_upgrade_culture_package(skip / dry-run / no-tool / exec-reach),_wait_for_server_port(accept / refuse / non-culture PID),_restart_single_service(service path + subprocess fallback + timeout swallow),_restart_mesh_services,_resolve_mesh_for_server,_restart_running_servers,_restart_from_config._install_mesh_servicesis excluded (systemd; Linux + root only);os.execvpre-exec is reached by the tests but its body is excluded as an end-of-life branch.culture/cli/server.py— 21% → 76% (396 statements, 96 missing).tests/test_cli_server.py(54 tests) covers_resolve_server_name,dispatch(including theagentirc.cli.dispatchpassthrough for forwarded verbs),_cmd_default,_server_rename(every branch),_check_server_archived,_check_already_running,_resolve_server_links,_wait_for_graceful_stop,_force_kill(POSIX SIGKILL + win32 SIGTERM +ProcessLookupErrorswallow),_server_stop,_server_status,_validate_config_name,_update_single_bot_archive,_set_bots_archive_state,_server_archive,_server_unarchive, plus the_server_startdispatcher. The 96 missing lines are concentrated in three deliberately-skipped integration-territory functions:_run_foreground,_daemonize_server,_run_server— exercised bytests/test_integration_irc_transport.pyagainst a real IRCd, not unit tests.
Phase 3c — cli/agent.py (2026-05-13)
Measured: 77.30% (6260 statements, 1421 missing) → fail_under = 77. The last CLI module is now covered:
culture/cli/agent.py— 47% → 88.8% (579 statements, 65 missing).tests/test_cli_agent.py(104 tests, heavily parametrized) coversdispatch, the four backend_create_*_configfactories (parametrized across claude/codex/copilot/acp),_parse_acp_command(all branches incl. JSON parse, fallback split, rejection),_check_existing_agent(clean / archived / active duplicate),_to_manifest_agent,_save_agent_to_directory,_cmd_create(per-backend + collision),_cmd_join, every resolution helper (_get_active_agents,_resolve_by_nick,_resolve_auto,_resolve_agents_to_start,_resolve_agents_to_stop),_cmd_startdispatcher routing, the backend daemon factories (_create_codex_daemon,_create_copilot_daemon,_create_claude_daemon),_coerce_to_acp_agent,_make_backend_config,_cmd_stop,_cmd_status(all branches),_cmd_rename/_cmd_assign(every branch), the IPC dispatcher chain (_resolve_ipc_targets,_argparse_error,_send_ipc,_ipc_to_agents), the IPC verb wrappers (_cmd_sleep,_cmd_wake),_cmd_learn(nick / no-nick / cwd-match / no-match),_cmd_message(empty target / empty text / send),_cmd_read,_cmd_archive/_cmd_unarchive/_cmd_delete,_cmd_unregister. The 65 missing lines are concentrated in 5 deliberately-skipped integration-territory functions (_probe_server_connection,_start_foreground,_start_background,_run_single_agent,_run_multi_agents) — exercised bytests/test_integration_agent_runner.pyagainst a real IRCd.
Phase 4a — persistence.py + config.py + telemetry/audit.py (2026-05-13)
Measured: 79.68% (6260 statements, 1272 missing) → fail_under = 79. The pure config / OS-services / audit layer is now covered:
culture/persistence.py— 54% → 97.95% (146 statements, 3 missing).tests/test_persistence.pyextended with 26 new tests covering_run_cmd(success / timeout),install_service(macOS plist + Windows .bat + unsupported-platform raise),uninstall_service(every platform),list_services(macOS / Windows / unsupported-empty), andrestart_service(every platform + missing-unit + Windows-query-timeout). The 3 missing lines are trivial path-builders (_systemd_user_dir/_launchd_dir/_windows_service_dir) that don’t exercise platform-specific branches under test.culture/config.py— 75% → 90.11% (445 statements, 44 missing).tests/test_manifest_config.pyextended with 9 new tests coveringarchive_manifest_server(happy path + skip-already-archived + missing-culture-yaml + extra-unrelated-agent-in-dir),unarchive_manifest_server(round-trip + empty + missing-yaml), and_load_legacy_config(full body + empty-agents fallback). The 44 remaining missing lines are scattered tiny error paths in YAML I/O helpers and rename validators.culture/telemetry/audit.py— 82% → 91.89% (185 statements, 15 missing). Newtests/telemetry/test_audit_rotation.py(25 tests) covers_write_all(full / short-write retry / zero-bytes-raise),_target_for(every branch incl. non-dict data),build_audit_record(federated origin, extra_tags, actor-kind/remote_addr propagation),AuditSink.submit(before start + disabled),start()idempotency,shutdown()no-op paths + fd-close OSError swallow,_pick_rotation_path(suffix increment + stat-error fallback),_maybe_rotateopen-failure-keeps-old-fd, andinit_audit/reset_for_testswarning paths.
Note: this PR landed 0.3pp short of the original 80% target. The gap is absorbed by the existing Phase 7 sweep.
Phase 4b — observer.py + overview/collector.py + bots/* (2026-05-13)
Measured: 83.93% (6260 statements, 1006 missing) → fail_under = 83. The four IRCd-integrated modules picked up substantial coverage; the project floor jumps 4pp in a single PR and hits the original Phase 4b target (83) exactly, while soaking up most of the Phase 4a 0.3pp shortfall as a side effect:
culture/observer.py— 28% → 92% (183 statements, 14 missing). Newtests/test_observer.py(28 tests) covers the pure parsers (_parse_history_line/_parse_who_line/_parse_list_linetable-driven), the registration/query state machines (_process_registration_line,_process_query_line,_drain_query_bufferexercised with a_RecordingWriter+ canned bytes), and the full public API (list_channels/who/read_channel/send_message) round-tripped against the realserverfixture. The CRLF-in-target injection test pins the strip behavior — the smuggledJOINcollapses into the channel name rather than starting a second protocol line. The remaining 14 missing lines are the_disconnectwriter-close swallow paths and a recv-empty-line fall-through that fires only on socket close.culture/overview/collector.py— 71% → 98% (255 statements, 6 missing). 23 new tests appended totests/test_overview_collector.pycovering_inject_stopped_agents(manifest archived skip + already-present pass-through + non-list channels fallback),_handle_registration_line(PING / 001 / 433-retry / unknown),_recv_until(stop-command termination + PING absorption + blank-line skip + timeout),_query_roommeta(every key parsed:room_id/owner/purpose/tagsCSV /persistenttruthy + ERR_UNKNOWNCOMMAND short-circuit),_query_tags(parsed list + ERR_NOSUCHNICK empty),_collect_bots(missing dir / missing yaml / malformed yaml swallow), and_enrich_via_ipcagainst a real tmp_path Unix socket — happy status,circuit_open/pausedstatus flips, stranger-socket skip, foreign-server skip, malformed-not-a-socket connection-refused swallow, andok: Falseresponse no-mutation. End-to-endcollect_mesh_stateinjects stopped manifest agents.culture/bots/bot.py— 62% → 98% (155 statements, 3 missing). 21 new tests appended totests/test_bot.pycovering_DynamicEventType.__str__,_check_rate(burst + release-after-window with pinnedtime.monotonic),_render_data_values(templated strings + non-string passthrough + sandbox-rejected dunder fall-through),start()idempotency,handle()empty-message short-circuit,_resolve_channels(dynamic-from-event + empty fallback + non-dict ctx),_deliverre-join branch when the channel was dropped, the full_maybe_fire_eventmatrix (happy enum-typed emit / dynamic-typed emit / no-spec early-return / invalid type regex skip / rate-limited skip / emit-raises log-and-swallow), and_run_custom_handleragainst a realimportlib.utilload (handler-on-disk happy / nohandle()fallback / handler raises fallback / handler returnsNone→ empty). The 3 missing lines are inside therelative_to-fails security guard that can’t be reached without forging an out-of-tree path.culture/bots/bot_manager.py— 65% → 99% (175 statements, 1 missing). 24 new tests appended totests/test_bot_manager.pycoveringstart()listener wiring + crash-safe teardown whenload_botsraises +OSErrorlistener-bind swallow,stop()exception-tolerant listener teardown,load_bots(missing dir / dir-without-yaml / archived skip / invalid filter / malformed yaml exception swallow),register_botfilter compile-error raise,_try_start_bot(mid-start re-entrancy guard + start-fails log-and-return-false),_matches_event(non-event triggers / no compiled filter / evaluate raises),_dispatch_to_bot(try-start-fails short-circuit + handle-raises span error),on_eventevent matching,start_botload-from-disk fallback + unknown-name raise,stop_allper-bot exception swallow, andload_system_bots(collision skip + register raises + server-config forwarding). The one remaining missing line is a single guard branch inside_dispatch_to_bot’s span context.
Phase 5 — Delete culture/transport/ (2026-05-13)
Measured: 89.13% (5669 statements, 616 missing) → fail_under = 89. Pivoted from backfill to deletion after auditing the file: culture.transport.client.Client was never instantiated in production (agentirc.ircd.IRCd._handle_connection imports agentirc.client.Client, not culture’s copy), and the four tests/telemetry/test_*.py files that imported it were all isolation-style unit tests against the dead code — none of them exercised the production IRCd connection path.
Pre-deletion audit confirmed agentirc.client.Client (in the installed agentirc-cli package) carries the same instrumentation hooks the deleted culture tests covered: _submit_parse_error_audit (line 182 upstream), the irc.parse_error span event (163), and traceparent injection on outbound send (95–116). The instrumentation is alive in production, just on the upstream side.
culture/transport/remote_client.py was also deleted — RemoteClient is now agentirc.remote_client.RemoteClient and culture’s copy was unused. The whole culture/transport/ package is removed in this PR.
Deletions:
culture/transport/client.py(569 statements, dead in production)culture/transport/remote_client.py(18 statements, superseded byagentirc.remote_client)culture/transport/__init__.py(4 statements)tests/telemetry/test_parse_error.py(1 test)tests/telemetry/test_audit_parse_error.py(4 tests)tests/telemetry/test_dispatch_span.py(3 tests)tests/telemetry/test_outbound_inject.py(4 tests)
Net effect: project drops by 591 statements (6260 → 5669) and 12 tests (1292 → 1280); coverage jumps +5.2pp because the deleted slice was at 25% (424 missing / 569 stmts on client.py alone). The Phase 5 → 6 step is now a single +5.2pp move instead of two +3pp moves; Phase 6 picks up only the final +0.87pp to reach 90.
Phase 6 — Final sweep to 90 (2026-05-13)
Measured: 90.28% (5669 statements, 551 missing) → fail_under = 90. The 60→90 ratchet is complete. 64 new tests across the long-tail candidates lifted the project floor the last 1.15pp.
Per-file:
culture/credentials.py— 69% → 100% (32 statements, 0 missing). 16 new tests intests/test_credentials.pyparametrize the FileNotFoundError tool-name lookup across all 4 platform branches (linux/darwin/win32/freebsd) and coverstore_credential/lookup_credential/delete_credentialon each platform (security on darwin; powershell + PS-script interpolation on win32; secret-tool with stdin-piped password on linux).culture/pidfile.py— 65% → 98% (104 statements, 2 missing). 17 new tests intests/test_pidfile.pycover port files (write_port/read_port/remove_port + invalid-int swallow), read_pid invalid-int branch, is_process_alive PermissionError, default_server file (read/write + missing/empty/OSError), rename_pid (both files / pid-only / missing / OSError swallow), and list_servers (empty dir / missing dir / running culture servers / dead PIDs / non-culture / port fallback / unreadable PID). The 2 missing lines are insideis_process_alive’s ProcessLookupError/PermissionError catch which can’t be safely exercised without forging an OS state.culture/cli/shared/ipc.py— 74% → 98% (42 statements, 1 missing). Newtests/test_cli_shared_ipc.py(10 tests) coversipc_requestagainst a realasyncio.start_unix_serverin tmp_path (happy response + missing-socket + server-drops swallow),ipc_shutdown(ok / not-ok / missing-socket),get_observerconfig-driven factory (with / without / emptyCULTURE_NICK), andagent_socket_pathruntime-dir routing. The 1 missing line is the deadline-exhausted return-None path on line 40 (hard to force deterministically without monkeypatching the event loop clock).culture/cli/shared/formatting.py— 0% → 100% (2 statements). Trivial re-export test intests/test_cli_afi_devex.pyconfirmsculture.cli.shared.formatting.relative_time is culture.formatting.relative_time.culture/cli/afi.py— 73% → 93% (15 statements, 1 missing). 5 tests intests/test_cli_afi_devex.pycoverdispatch(forwards argv + handles None args),_entry(raises SystemExit(2) when afi-cli not installed),register(adds REMAINDER-typed subparser), and the NAME/register/dispatch protocol attributes. The 1 missing line is the# pragma: no coverbranch already excluded by Phase 1’s exclude_lines.culture/cli/devex.py— 73% → 100% (15 statements). Same shape as afi.py — 5 tests cover all paths.
Phase target table — 60 → 90 ratchet COMPLETE
| Phase | Floor | Measured | PR | Status |
|---|---|---|---|---|
| 1 | 60 | 60.99% | #383 | ✅ merged |
| 2 | 62 | 62.91% | #384 | ✅ merged |
| 3a | 67 | 67.89% | #385 | ✅ merged |
| 3b | 73 | 73.40% | #386 | ✅ merged |
| 3c | 77 | 77.30% | #387 | ✅ merged |
| 4a | 79 | 79.68% | #388 | ✅ merged |
| 4b | 83 | 83.93% | #389 | ✅ merged |
| 5 | 89 | 89.13% | #390 | ✅ merged |
| 6 | 90 | 90.28% | (this PR) | in flight |
The plan started 2026-05-13 with the project at 60.01% and fail_under = 56; today’s PR closes it at 90.28% with fail_under = 90. From here, the SonarCloud “Coverage on New Code ≥ 80%” gate continues to prevent regressions on new code, and the pytest floor at 90 prevents wholesale backsliding.