System Cognition

From observability to autonomous action — guided by conventions.

The gap

Observability was built for humans. Dashboards, alerts, log search — all designed for a person to look at a screen, interpret patterns, and make decisions. Agents cannot squint at Grafana. They cannot intuit from a spike on a chart that a deployment is failing. They need structured data, explicit contracts, and declared behavior.

The shift

The progression: observability (humans watch systems) → situational awareness (agents read structured system state) → system cognition (agents understand what they're looking at well enough to act) → autonomous action (agents take safe, bounded action based on what they understand).

Each step requires more structure, not more intelligence.

The practical test

Is this output sufficient for another system to safely take action? If yes, the tool participates in system cognition. If no — if a human must interpret, translate, or decide — the tool is observability, not cognition.

What cognition means

An agent with system cognition can answer: what is running (inventory), is it healthy (doctor), who owns it (provenance), what depends on what (handoffs), what can I control (commands), how do I route work (handoff contracts), what am I not allowed to touch (scope boundaries).

ANCC as enabler

ANCC provides the structural layer: SKILL.md declares capabilities, --format json provides machine-readable output, exit codes signal outcomes, doctor reports runtime readiness. These are not features — they are the minimum contract for an agent to understand a tool.

Self-evolving tooling

The loop: agent uses tool → discovers gap (missing signal, wrong output, stale template) → creates work order describing the gap → another agent (or human) resolves the WO → tool improves → agent uses improved tool. This is first-order self-improvement: the system fixes its own infrastructure. When the gap cannot be covered by any existing tool, the loop extends into tool genesis — creation from zero.

Work order format

Work orders (WOs) are the second contract in the system alongside SKILL.md. Every self-improvement loop — first-order, meta-cognition, governance — produces a WO. The minimum fields:

{
  "wo_version": "1",
  "id": "wo-<unique-id>",
  "kind": "incident | tooling_improvement | binding_directive",
  "target": {
    "tool": "tool-name",
    "component": "commands | templates | config",
    "scope": "/path/or/domain"
  },
  "observations": [
    {
      "type": "descriptive_type",
      "severity": "low | medium | high | critical",
      "detail": "what was observed",
      "data": {}
    }
  ],
  "improvement_class": "A | B | C",
  "constraints": {
    "allow_paths": [],
    "max_steps": 5
  }
}

The format is intentionally minimal. WOs carry enough structure for agents to route, prioritize, and scope work — not enough to become a workflow engine.

Guided evolution

Self-improvement without constraints is cancer. The conventions exist to prevent uncontrolled growth: scope boundaries define what a tool must never become, the extend-vs-new rubric prevents feature absorption, handoff contracts enforce composition over expansion, deprecation conventions enable subtraction. These conventions also reduce the cost of agent decisions — every structural constraint is a decision that was already made, for free. See Resource Governance.

Meta-cognition

The system can observe its own observation quality. A quality inspector analyzes WO output: are fields consistently missing? Are the same gaps recurring across WOs? When a pattern appears in 3+ cases, the inspector creates a tooling_improvement WO — not a one-off fix, but a systemic improvement to the observation layer.

Improvement classes

Not all improvements are equal. Type A (template changes) — adding detection logic, expanding coverage. Low risk. Type B (core logic) — changing how the tool reasons. High risk. Type C (contract changes) — modifying output schema, changing field semantics. Critical risk — downstream consumers break. Governance escalates proportionally.

Three loops

First-order: fix infrastructure. Deployscope detects degraded deployment → WO → agent writes PR → human approves → deployment fixed.

Second-order: improve observers. Inspector detects nullbot misses replication lag → improvement WO → template updated → nullbot now detects replication lag.

Third-order: governance enforces ecosystem fitness. Governance agent runs ancc validate → detects scope pressure → creates binding directive → agent must resolve (implement, reject with reason, or defer with plan) → human arbitrates.

Binding directives

"Must resolve" not "must implement." A binding directive gives the assigned agent three options: implement the change, reject with structured reasoning and evidence (escalates to human), or defer with a plan and new deadline (requires human approval). The system is safe because agents can always push back — but they cannot ignore.

The governance triangle

Governance agent (prosecutor): enforces rules, detects violations, creates mandates. Never invents features or architecture. Evidence-based only.

Execution agents (workers): resolve directives. CAN push back with structured evidence. Must respond within deadline.

Humans (judges): approve code changes, override directives, final arbitration. The system must be safe assuming humans rubber-stamp 80% of approvals.

Trust hierarchy

Governance (read-only mandates) > Execution (write under mandate) > Runtime enforcement (gates all execution). The governance agent cannot write code. Execution agents cannot create mandates. Runtime enforcement (chainwatch) gates everything. Separation of powers — not because agents are malicious, but because concentrated authority makes mistakes easy.

Positioning

This is not AIOps (ML anomaly detection on metrics). Not agent observability (tracing LLM calls). Not a platform (no runtime, no dependency). It is the structural layer that makes deterministic CLI tools composable by autonomous agents — and keeps them composable as they evolve.

Design principles