Headroom sidecar

Colibri can optionally ask a local headroom-ai sidecar to compress tool results

before they are rendered through the session read path.

This is off by default. The daemon does not start or supervise Headroom; it

only connects to an already-running Unix socket when explicitly enabled.

Configuration

COLIBRI_HEADROOM_ENABLED=1

COLIBRI_HEADROOM_SOCKET=/var/run/colibri/headroom.sock # optional default

Start the sidecar with a Python environment that can import headroom:

python3 scripts/headroom-sidecar.py --socket /var/run/colibri/headroom.sock

Then start colibri-daemon with the environment above. Check connection state

with colibri status; the status JSON includes:

{
  "headroom": {
    "enabled": true,
    "socket": "/var/run/colibri/headroom.sock",
    "connected": true
  }
}

The daemon keeps one Unix-socket connection open and sends newline-delimited JSON

requests. The sidecar must support multiple requests on that same accepted

connection.

Request:

{"id":"tool-name","raw":"tool output text","role":"tool"}

Response:

{"id":"tool-name","compressed":"shorter text","tokens_before":100,"tokens_after":25}

If tokens_after >= tokens_before, Colibri keeps the original tool result. If

the sidecar errors, disconnects, or times out, Colibri degrades to the original

uncompressed content.

Current daemon timeout: 5 seconds per sidecar request.

a HeadroomSidecar; it is not a global daemon-wide replacement for existing

cost-mode compaction.

staged and validated on the target image.

FreeBSD: ONNX/ORT-backed extras need local packaging; use a known-good Python

environment and validate scripts/headroom-sidecar.py directly before enabling

the daemon flag.