Glasspane — agent state supervision
Glasspane — agent state supervision
← index
What this is
Glasspane is Colibri's agent observation layer. It watches agent subprocesses
via their JSONL stdout, folds the stream into a semantic state machine
(Idle → Working → Done), and exposes a snapshot API for dashboards and
daemon coordination. Every spawned agent — Pi, zot, or a local sample — feeds
through the same ingestor and ends up in the same taxonomy.
Decisions
Agent state as a state machine, not raw event log
Glasspane doesn't just relay raw agent events. It ingests JSONL lines and
transitions a named pane through a finite set of states:
Idle → Working → Blocked → Done
↳ Error
The AgentState enum (Idle, Working, Blocked, Done, Error) is deliberately
small. It captures what a supervisor needs to know — "is the agent working?
blocked? finished?" — without encoding agent-specific semantics. Events that
don't change the state (e.g. a usage report from zot) are recorded in the pane's
metadata but don't affect the state machine.
Stalled is not a sixth variant — it is a derived flag: a pane is stalled
when no event has arrived within DEFAULT_STALL_AFTER (4 hours). Derived
attention (Error / Blocked / Stalled) is covered by
operator-attention. Why not just tail the log: raw event logs are agent-specific and change overtime (zot adds new event types). The state machine is a stable contract that the
daemon, TUI, and client CLI can all rely on.
→ crates/colibri-glasspane/src/lib.rs
JSONL streaming (one line = one event)
Agents emit structured events as newline-delimited JSON on stdout. Glasspane
reads line-by-line with BufReader, deserializes each line, and feeds it into
the PiJsonlIngestor (the name is legacy — it handles zot events too).
The reader runs in a single background task per pane (pane_reader_loop).
It never blocks the daemon's main loop — the ingestor is a synchronous fold
that updates the pane's in-memory state, and the snapshot API reads from
Arc> with no contention on the reader hot path.
Malformed lines are skipped with a counter increment, not an error —
dropouts in an agent's JSONL shouldn't crash the observer.
Why JSONL, not a socket or gRPC: the agent is a subprocess, not a service.stdout is the universal interface — every language, every harness, zero setup.
JSONL is trivial to write from bash, Go, Python, Rust. A structured wire format
would add a dep and a handshake to every agent.
→ crates/colibri-glasspane/src/lib.rs
(PiJsonlIngestor, pane_reader_loop)
AgentRuntime { Pi, Zot, Local } — one taxonomy for two harnesses
Pi and zot emit different raw event types: Pi uses agent_start /
turn_end, zot uses turn_start / done. Glasspane maps both into the same
AgentState transitions via zot_event_type(). The AgentRuntime enum tags
each pane with its harness so the mapping function knows which event vocabulary
to parse.
The Pane struct's session_id field uses #[serde(alias = "pi_session_id")]
for backward compatibility with pre-neutrality serialized snapshots.
Why not have two separate state machines: the TUI, daemon scheduler, andclient CLI all need to ask "what state is this agent in?" — they don't care
whether it's zot or Pi. One taxonomy, one API. The mapping is a ~50-line
function, not a subsystem.
→ crates/colibri-glasspane/src/lib.rs
(zot_event_type, AgentRuntime)
Snapshot API (read-heavy, not write-heavy)
Glasspane exposes a snapshot object (the full set of panes with their current
state, session ID, timestamp, and metadata) through Arc. The
daemon serves this over its Unix socket to client readers. Writes happen once
per event; reads are frequent (TUI polls, CLI status checks).
Why RwLock, not channels: the write path is low-frequency (agent JSONL athuman-reading speed), and the read path is lock-free in the common case. A
channel-based design would add buffering and delivery semantics for a problem
that's fundamentally about current state, not event delivery.
→ crates/colibri-glasspane/src/lib.rs
(Supervisor, snapshot)
Usability roadmap (TODO)
The attention half of this roadmap shipped: the derived attention
predicate, the TUI attention bar / jump keys / filter / row highlight, and
edge-triggered terminal-capture alerts. See
operator-attention for the shipped system. Whatremains here is the genuinely-unbuilt direction.
Push notifications outbound, not just on-screen
The operator supervises headless hosts over Tailscale, not by staring at the
TUI. When a pane raises attention (or hits Done), push it out: a desktop
notification on the live image (XFCE) and a Telegram message (the token is
already provisioned). An explicit colibri notify-style path — or a glasspane
event type that a zot/Pi hook fires — lets an agent say "I'm blocked" rather than
relying only on inferred state. Highest real-world impact item.
Richer pane rows (context at a glance)
Glasspane already stashes non-state events in pane metadata. Surface that in the
TUI row: current repo/branch, last line / task summary, the jail the
agent runs in, optionally listening ports. Turns "Working" into "Working on
fix/x in jail cms, last: running tests".
Persist pane history across daemon restarts
The supervisor is in-memory (Arc); a daemon restart loses the
timeline. Persist pane transitions/history so returning after hours (or a
reboot) preserves "what happened while I was away". Lightweight durability, not a
new subsystem.
Answer a blocked agent from the dashboard (bigger lift)
The snapshot API is read-heavy by design. A future write path — "send input to
pane N" over the daemon socket — would let the operator respond to a blocked
agent from colibri-tui, not just observe/spawn/kill. This is direction, not a
quick win; it changes the socket from read-only supervision to interactive
control and needs its own design pass.
See also
- agent-harness — the zot/Colibri split that Glasspane observes
- operator-attention — the shipped attention/alert layer over this state machine, including terminal capture + signature triage
- naming-decisions —
pi_session_id → session_id,pi_type → event_type