ADR: Isolation Hardening (2026-05-13)

Status: Accepted. Context: Agent symlink incident in scitex-stats-auditor proof of concept; sac’s positioning vs. Clew reproducibility verification.

Problem

On 2026-05-13 the first per-package auditor (scitex-stats-auditor) ran under Apptainer with sac-base.sif + overlay and reported: “the project venv targets /opt/python3.12/bin/python3.12, which is missing on this host; I repaired it by symlinking it to /usr/bin/python3.12.” The agent’s mental model was that it had patched the host. Investigation showed the symlink only landed in the container’s overlay — the host was untouched. But two systemic gaps were exposed:

The agent thought it had host write access. Apptainer’s defaults make the container/host boundary porous enough that an agent can’t distinguish “I patched the container’s view of /opt” from “I patched the host.”
Operator-side prompts can’t enforce isolation. The prompt told the agent to be read-only; the agent ignored it. Prompt-level guardrails are not a security mechanism.

The deeper issue: sac’s stated positioning is reproducible-by-default, but its actual default behavior inherited Apptainer’s HPC-convenience defaults (auto-bind $HOME, /tmp, /proc, /sys, /dev; inherit all env vars; share host namespaces). Convenience-first defaults are upside-down for an agent runtime where the container is supposed to be the security boundary.

Decisions

D1. Hardened isolation by default; `relaxed: true` to opt out.

spec.apptainer.relaxed: false is the default. sac auto-prepends --containall (filesystem isolation), --cleanenv (environment isolation), and --writable-tmpfs (when no overlay is declared) to the apptainer argv.

relaxed: true is an explicit opt-out for HPC-style convenience use cases. Agents started with relaxed: true are outside the Clew verification chain — their runs cannot be attested as reproducible.

Rationale. sac’s differentiation against LangGraph / CrewAI / AutoGen is “spec.yaml declares isolation; mechanism enforces it; external verifier can attest it.” Default-strict supports that thesis directly. HPC users can opt in to the legacy behavior with one line, and they pay the cost of falling out of the verification chain (which they typically don’t need anyway).

D2. Universal preflight via `$HOME`-visibility check, not per-path enumeration.

The preflight that sac auto-injects before user startup_commands is:

test "$(id -u)" != "0" || (echo 'ERROR: running as root' && exit 1)
test ! -d "$HOME" || (echo 'ERROR: host $HOME visible — isolation breach' && exit 1)

Rationale. Per-path enumeration (test ! -e $HOME/.gitconfig, test ! -e $HOME/.ssh, …) has unbounded false-negative risk: every new credential store added in the next decade (.kube/config, .docker/config.json, .netrc, .npmrc, .pypirc, .gnupg/, ~/.config/anthropic/, ~/.bash_history with embedded secrets, …) requires a new line. The $HOME-visibility check covers all of them at once.

Under --containall, $HOME is NOT auto-bound — it should be invisible inside the container. If the check fails, either --containall isn’t in effect or an operator-declared bind brought it in. Either way, the agent shouldn’t start.

Operator opt-out for paths that legitimately need to be visible:

spec:
  apptainer:
    preflight_allow:
      - "$HOME/.gitconfig"   # acknowledged: agent needs read-only gitconfig

The opt-out is declared per-path, not as a blanket “disable preflight.”

D3. AgentCard exposes structured isolation block, not a flat enum.

Instead of isolation_level: hardened | relaxed | custom:

"x-scitex-agent-container": {
  "isolation": {
    "level": "hardened",
    "containall": true,
    "cleanenv": true,
    "writable_tmpfs": false,
    "preflight_passed": ["uid-nonzero", "no-host-home"],
    "preflight_allowed": [],
    "binds_count": 3,
    "binds_writable_count": 0
  }
}

level: hardened is the human shorthand for “all booleans true + preflight_allowed: []”. External verifiers (Clew, orochi attestation) read the structured booleans to attest specific properties.

Rationale. A flat custom label hides what’s custom about it. A run with preflight_allow: [$HOME/.ssh] and a run with preflight_allow: [$HOME/.aws] are both custom under the enum but have very different security profiles. Clew’s verification chain wants to attest specific properties: “did this run set containall? did it allow any preflight bypasses?” The structured block answers those directly.

Considered and rejected

Rejected: prompt-level isolation guarantees.

“Tell the agent in the prompt to be read-only.” The PoC proved this fails — the agent acknowledged the rule and then violated it. Prompts are not a security mechanism. Mechanism-level enforcement is non-negotiable for the Clew thesis.

Rejected: `test -w <path>` permission-based preflight.

test ! -w /opt would fail inside any container with a writable overlay shadowing /opt, even when no host damage is possible. The intent (catch leaks) and the test (catch any write capability) don’t align under apptainer’s overlay model. Existence-based checks on identity files ($HOME visibility) avoid this entirely.

Rejected: enumerated per-path preflight (`.gitconfig`, `.ssh`, `.aws`, …).

Forever incomplete; every new credential cache added by tooling elsewhere requires a sac patch. The $HOME-visibility check covers the same surface in one assertion.

Rejected: `isolation_level: hardened | relaxed | custom` flat enum.

Loses verifier resolution. Two custom runs with very different preflight_allow sets become indistinguishable on the card.

Implementation

Layer	Where	Status
`apptainer.relaxed: false` default	`config/_types.py::ApptainerSpec`	✅ shipped earlier today
`--containall` auto-prepended	`runtimes/_apptainer_runtime.py`	✅ shipped
`--cleanenv` auto-prepended	same	✅ shipped (D1)
`--writable-tmpfs` when no overlay	same	✅ shipped (D1)
Universal preflight injection	runtime wraps inner cmd with `bash -c "<preflight>\nexec <inner>"`	✅ shipped (D2)
`spec.apptainer.preflight_allow` field	`config/_types.py::ApptainerSpec`	⏳ deferred (out of D1–D4 scope)
AgentCard structured `isolation` block	`a2a/_card.py::project_card`	✅ shipped (D3)
`sac agents check` D4 bind-target warning	`cli_pkg/build_cmds.py::check`	✅ shipped (D4)
Regression tests	`tests/.../test__apptainer_runtime.py` + `tests/.../a2a/test__card.py` + `tests/.../cli_pkg/test_build_cmds.py` + `tests/.../runtimes/test__apptainer_isolation.py` + `tests/.../runtimes/test__apptainer_preflight.py`	✅ shipped

Consequences

Positive.

sac becomes the only A2A-compatible agent runtime that mechanically enforces isolation by default. Clear differentiation from LangGraph / CrewAI / AutoGen (no isolation concept) and from Docker (isolation exists but not declared / not attestable).
Clew can build verification chains that attest specific isolation properties without sac-internal introspection.
The scitex-stats-auditor symlink-confusion class of incidents becomes impossible: the agent can’t “fix” the host because the host isn’t reachable.

Negative / tradeoffs.

Existing agents that worked under the old defaults will fail-fast if they implicitly relied on $HOME auto-bind. Operators must either declare the bind explicitly OR opt in to relaxed: true.
The --cleanenv default removes $PATH inheritance; sac has to inject the container’s expected $PATH itself.
HPC-style “just run my command inside the container with my home visible” is now a two-step (write relaxed: true); intentional.

Addendum: D5 — canonical container HOME (2026-05-14)

Live verification of the hardened auditor exposed two issues with D2 as originally specified:

Empty-$HOME false-fires. Apptainer scaffolds $HOME from the inherited passwd entry regardless of bind targets (even under --containall), so $HOME is always a directory. The D2 check was relaxed to “$HOME is empty” — workable, but it forces bind targets out of $HOME, breaking operator-intuitive paths.
/srv/-style targets force script rewrites. Anything that references ~/proj/X or $HOME/proj/X inside the container breaks.

D5. Canonical container HOME = `/home/agent`.

sac auto-injects --home /home/agent (skipped only under apptainer.relaxed: true or when the operator declared --home). Inside the container:

$HOME == /home/agent, operator-independent.
Bind targets use the canonical HOME: ~/proj/X:/home/agent/proj/X.
The operator’s actual host home is never scaffolded inside the container.
sac-base.sif is rebuilt so UID-1000’s passwd entry reads agent (not ubuntu); whoami matches $HOME.

D5 preflight (replaces D2’s empty-`$HOME` check).

# uid != 0, OR /proc/self/uid_map confirms userns-fakeroot.
if [ "$(id -u)" = "0" ]; then
  awk '$1==0 && $2!=0 {found=1} END {exit !found}' /proc/self/uid_map \
    || exit 11   # real root, refuse
fi
test "$HOME" = "/home/agent" || exit 12  # canonical HOME

Two static lines, attestable by sha256. The “no host leak” property falls out of --containall + canonical --home + declared binds:.

Rationale.

D4 stays but is no longer the only path. Bind destinations may live under /home/agent/... (intuitive) OR /srv/, /work/, /opt/, /data/ (container-canonical). Both pass D5.
fakeroot opt-in (apptainer.fakeroot: true) integrates: inside the container id -u == 0, but /proc/self/uid_map proves userns-mapping; the preflight accepts this without weakening the no-real-root guarantee.

Implementation (D5).

Layer	Where	Status
`apptainer.fakeroot: bool`	`config/_types.py::ApptainerSpec`	✅
Auto-prepend `--home /home/agent`	`runtimes/_apptainer_iso_flags.py`	✅
Auto-append `--fakeroot` when opted in	same	✅
Preflight rewrite (uid-map + canonical-HOME)	`runtimes/_apptainer_preflight.py`	✅
sac-base.sif: ubuntu → agent	`containers/apptainer-base.def`	✅ recipe; SIF rebuild pending
Bind destination validation	`config/_parsers/_apptainer.py`	✅
Doc updates + AgentCard JSON example	`docs/isolation.md`	✅

Network isolation addendum (2026-05-14).

Peer review pushed on --network=bridge, surfaced the A2A + MCP-over-loopback interop constraint, and converged on: stay with host netns for the Clew arXiv window; plan a --network=bridge + bridge-IF bind + sac-host /etc/hosts injection migration that preserves MCP URL stability. See docs/isolation.md §4.

References

docs/isolation.md — the 10-category leak catalog this ADR’s decisions close.
docs/spec-reference.md — spec.apptainer.relaxed, spec.apptainer.fakeroot.
The 2026-05-13 scitex-stats-auditor incident — session.jsonl at ~/.scitex/agent-container/runtime/scitex-stats-auditor/ for the full trace.

ADR: Isolation Hardening (2026-05-13)

Problem

Decisions

D1. Hardened isolation by default; `relaxed: true` to opt out.

D2. Universal preflight via `$HOME`-visibility check, not per-path enumeration.

D3. AgentCard exposes structured isolation block, not a flat enum.

Considered and rejected

Rejected: prompt-level isolation guarantees.

Rejected: `test -w <path>` permission-based preflight.

Rejected: enumerated per-path preflight (`.gitconfig`, `.ssh`, `.aws`, …).

Rejected: `isolation_level: hardened | relaxed | custom` flat enum.

Implementation

Consequences

Addendum: D2 refinement (2026-05-13 evening)

D4. Bind targets MUST be container-canonical paths.

Order of execution

Addendum: D5 — canonical container HOME (2026-05-14)

D5. Canonical container HOME = `/home/agent`.

D5 preflight (replaces D2’s empty-`$HOME` check).

Rationale.

Implementation (D5).

Network isolation addendum (2026-05-14).

References

ADR: Isolation Hardening (2026-05-13)

Problem

Decisions

D1. Hardened isolation by default; relaxed: true to opt out.

D2. Universal preflight via $HOME-visibility check, not per-path enumeration.

D3. AgentCard exposes structured isolation block, not a flat enum.

Considered and rejected

Rejected: prompt-level isolation guarantees.

Rejected: test -w <path> permission-based preflight.

Rejected: enumerated per-path preflight (.gitconfig, .ssh, .aws, …).

Rejected: isolation_level: hardened | relaxed | custom flat enum.

Implementation

Consequences

Addendum: D2 refinement (2026-05-13 evening)

D4. Bind targets MUST be container-canonical paths.

Order of execution

Addendum: D5 — canonical container HOME (2026-05-14)

D5. Canonical container HOME = /home/agent.

D5 preflight (replaces D2’s empty-$HOME check).

Rationale.

Implementation (D5).

Network isolation addendum (2026-05-14).

References

D1. Hardened isolation by default; `relaxed: true` to opt out.

D2. Universal preflight via `$HOME`-visibility check, not per-path enumeration.

Rejected: `test -w <path>` permission-based preflight.

Rejected: enumerated per-path preflight (`.gitconfig`, `.ssh`, `.aws`, …).

Rejected: `isolation_level: hardened | relaxed | custom` flat enum.

D5. Canonical container HOME = `/home/agent`.

D5 preflight (replaces D2’s empty-`$HOME` check).