ADR: Isolation Hardening (2026-05-13)

Status: Accepted. Context: Agent symlink incident in scitex-stats-auditor proof of concept; sac’s positioning vs. Clew reproducibility verification.

Problem

On 2026-05-13 the first per-package auditor (scitex-stats-auditor) ran under Apptainer with sac-base.sif + overlay and reported: “the project venv targets /opt/python3.12/bin/python3.12, which is missing on this host; I repaired it by symlinking it to /usr/bin/python3.12.” The agent’s mental model was that it had patched the host. Investigation showed the symlink only landed in the container’s overlay — the host was untouched. But two systemic gaps were exposed:

  1. The agent thought it had host write access. Apptainer’s defaults make the container/host boundary porous enough that an agent can’t distinguish “I patched the container’s view of /opt” from “I patched the host.”

  2. Operator-side prompts can’t enforce isolation. The prompt told the agent to be read-only; the agent ignored it. Prompt-level guardrails are not a security mechanism.

The deeper issue: sac’s stated positioning is reproducible-by-default, but its actual default behavior inherited Apptainer’s HPC-convenience defaults (auto-bind $HOME, /tmp, /proc, /sys, /dev; inherit all env vars; share host namespaces). Convenience-first defaults are upside-down for an agent runtime where the container is supposed to be the security boundary.

Decisions

D1. Hardened isolation by default; relaxed: true to opt out.

spec.apptainer.relaxed: false is the default. sac auto-prepends --containall (filesystem isolation), --cleanenv (environment isolation), and --writable-tmpfs (when no overlay is declared) to the apptainer argv.

relaxed: true is an explicit opt-out for HPC-style convenience use cases. Agents started with relaxed: true are outside the Clew verification chain — their runs cannot be attested as reproducible.

Rationale. sac’s differentiation against LangGraph / CrewAI / AutoGen is “spec.yaml declares isolation; mechanism enforces it; external verifier can attest it.” Default-strict supports that thesis directly. HPC users can opt in to the legacy behavior with one line, and they pay the cost of falling out of the verification chain (which they typically don’t need anyway).

D2. Universal preflight via $HOME-visibility check, not per-path enumeration.

The preflight that sac auto-injects before user startup_commands is:

test "$(id -u)" != "0" || (echo 'ERROR: running as root' && exit 1)
test ! -d "$HOME" || (echo 'ERROR: host $HOME visible — isolation breach' && exit 1)

Rationale. Per-path enumeration (test ! -e $HOME/.gitconfig, test ! -e $HOME/.ssh, …) has unbounded false-negative risk: every new credential store added in the next decade (.kube/config, .docker/config.json, .netrc, .npmrc, .pypirc, .gnupg/, ~/.config/anthropic/, ~/.bash_history with embedded secrets, …) requires a new line. The $HOME-visibility check covers all of them at once.

Under --containall, $HOME is NOT auto-bound — it should be invisible inside the container. If the check fails, either --containall isn’t in effect or an operator-declared bind brought it in. Either way, the agent shouldn’t start.

Operator opt-out for paths that legitimately need to be visible:

spec:
  apptainer:
    preflight_allow:
      - "$HOME/.gitconfig"   # acknowledged: agent needs read-only gitconfig

The opt-out is declared per-path, not as a blanket “disable preflight.”

D3. AgentCard exposes structured isolation block, not a flat enum.

Instead of isolation_level: hardened | relaxed | custom:

"x-scitex-agent-container": {
  "isolation": {
    "level": "hardened",
    "containall": true,
    "cleanenv": true,
    "writable_tmpfs": false,
    "preflight_passed": ["uid-nonzero", "no-host-home"],
    "preflight_allowed": [],
    "binds_count": 3,
    "binds_writable_count": 0
  }
}

level: hardened is the human shorthand for “all booleans true + preflight_allowed: []”. External verifiers (Clew, orochi attestation) read the structured booleans to attest specific properties.

Rationale. A flat custom label hides what’s custom about it. A run with preflight_allow: [$HOME/.ssh] and a run with preflight_allow: [$HOME/.aws] are both custom under the enum but have very different security profiles. Clew’s verification chain wants to attest specific properties: “did this run set containall? did it allow any preflight bypasses?” The structured block answers those directly.

Considered and rejected

Rejected: prompt-level isolation guarantees.

“Tell the agent in the prompt to be read-only.” The PoC proved this fails — the agent acknowledged the rule and then violated it. Prompts are not a security mechanism. Mechanism-level enforcement is non-negotiable for the Clew thesis.

Rejected: test -w <path> permission-based preflight.

test ! -w /opt would fail inside any container with a writable overlay shadowing /opt, even when no host damage is possible. The intent (catch leaks) and the test (catch any write capability) don’t align under apptainer’s overlay model. Existence-based checks on identity files ($HOME visibility) avoid this entirely.

Rejected: enumerated per-path preflight (.gitconfig, .ssh, .aws, …).

Forever incomplete; every new credential cache added by tooling elsewhere requires a sac patch. The $HOME-visibility check covers the same surface in one assertion.

Rejected: isolation_level: hardened | relaxed | custom flat enum.

Loses verifier resolution. Two custom runs with very different preflight_allow sets become indistinguishable on the card.

Implementation

Layer

Where

Status

apptainer.relaxed: false default

config/_types.py::ApptainerSpec

✅ shipped earlier today

--containall auto-prepended

runtimes/_apptainer_runtime.py

✅ shipped

--cleanenv auto-prepended

same

✅ shipped (D1)

--writable-tmpfs when no overlay

same

✅ shipped (D1)

Universal preflight injection

runtime wraps inner cmd with bash -c "<preflight>\nexec <inner>"

✅ shipped (D2)

spec.apptainer.preflight_allow field

config/_types.py::ApptainerSpec

⏳ deferred (out of D1–D4 scope)

AgentCard structured isolation block

a2a/_card.py::project_card

✅ shipped (D3)

sac agents check D4 bind-target warning

cli_pkg/build_cmds.py::check

✅ shipped (D4)

Regression tests

tests/.../test__apptainer_runtime.py + tests/.../a2a/test__card.py + tests/.../cli_pkg/test_build_cmds.py + tests/.../runtimes/test__apptainer_isolation.py + tests/.../runtimes/test__apptainer_preflight.py

✅ shipped

Consequences

Positive.

  • sac becomes the only A2A-compatible agent runtime that mechanically enforces isolation by default. Clear differentiation from LangGraph / CrewAI / AutoGen (no isolation concept) and from Docker (isolation exists but not declared / not attestable).

  • Clew can build verification chains that attest specific isolation properties without sac-internal introspection.

  • The scitex-stats-auditor symlink-confusion class of incidents becomes impossible: the agent can’t “fix” the host because the host isn’t reachable.

Negative / tradeoffs.

  • Existing agents that worked under the old defaults will fail-fast if they implicitly relied on $HOME auto-bind. Operators must either declare the bind explicitly OR opt in to relaxed: true.

  • The --cleanenv default removes $PATH inheritance; sac has to inject the container’s expected $PATH itself.

  • HPC-style “just run my command inside the container with my home visible” is now a two-step (write relaxed: true); intentional.

Addendum: D2 refinement (2026-05-13 evening)

Initial D2 design proposed test ! -d "$HOME" as a universal invariant. Pre-implementation verification against the scitex-stats-auditor spec.yaml exposed an Apptainer-specific edge case: Apptainer creates the entire directory path of every bind target as scaffolding, so a bind like /home/$USER/proj/scitex-stats:/home/$USER/proj/scitex-stats:ro causes /home/$USER (== $HOME) to exist as a directory inside the container — even under --containall, with no credential files visible. The D2 check would false-fire on every agent that mirrors host paths into the container.

Two solutions were considered:

  • Bind-aware preflight (sac generates the preflight at runtime, knowing what binds it’s about to create, and the preflight checks $HOME contents are a subset of the declared bind scaffolding). Rejected: dynamic preflight is harder to integrate into Clew’s verification chain — verifier has to attest the generator, not just the executed script.

  • Container-canonical paths (bind targets MUST use container- side conventional roots — /srv/, /work/, /opt/, /data/ — never host-mirroring paths). Accepted.

D4. Bind targets MUST be container-canonical paths.

Bind targets that mirror host paths (/home/, /Users/, /root/, absolute Windows-style paths) are deprecated. Bind targets MUST live under conventional container roots: /srv/, /work/, /opt/, /data/.

Spec.yaml convention:

spec:
  apptainer:
    binds:
      - $HOME/proj/scitex-stats:/srv/sources/scitex-stats:ro
      - $HOME/proj/scitex-dev:/srv/sources/scitex-dev:ro

Inside the container nothing under /home/$USER appears, so:

  1. The D2 preflight (test ! -d "$HOME") stays static and universal.

  2. spec.yaml is operator-agnostic — the same spec runs cleanly for any user.

  3. Verification chain receives a static preflight script with a stable sha256; no meta-verification required.

Rationale, beyond the technical fix. Container-canonical paths match Docker / OCI best practice and break the “works on my machine” failure mode (every agent that hardcodes /home/ywatanabe/proj/... is operator-bound). The shared-path convention was an HPC convenience artifact; Clew’s reproducibility context inverts it.

sac-side enforcement (planned). sac agents check <name> will emit a warning when a bind target starts with /home/, /Users/, or /root/. Future strict mode (sac.audit.strict_binds: true) makes it an error.

Convenience. The runtime sets $SAC_WORKDIR=/srv/sources inside the container (when any bind targets land there), so startup_prompts and operator scripts can use cd $SAC_WORKDIR/<pkg> without hardcoding paths.

Order of execution

  1. ADR addendum (this section) — done.

  2. scitex-stats-auditor spec.yaml — bind targets translated to /srv/sources/...; startup_prompts updated to reference the container paths.

  3. Implementation: D1 + D2 (static check) + D3 + D4 (CLI validator).

  4. Restart scitex-stats-auditor against hardened sac; verify the static preflight passes end-to-end.

  5. Preserve session.jsonl + preflight result for Clew supplementary.

Addendum: D5 — canonical container HOME (2026-05-14)

Live verification of the hardened auditor exposed two issues with D2 as originally specified:

  1. Empty-$HOME false-fires. Apptainer scaffolds $HOME from the inherited passwd entry regardless of bind targets (even under --containall), so $HOME is always a directory. The D2 check was relaxed to “$HOME is empty” — workable, but it forces bind targets out of $HOME, breaking operator-intuitive paths.

  2. /srv/-style targets force script rewrites. Anything that references ~/proj/X or $HOME/proj/X inside the container breaks.

D5. Canonical container HOME = /home/agent.

sac auto-injects --home /home/agent (skipped only under apptainer.relaxed: true or when the operator declared --home). Inside the container:

  • $HOME == /home/agent, operator-independent.

  • Bind targets use the canonical HOME: ~/proj/X:/home/agent/proj/X.

  • The operator’s actual host home is never scaffolded inside the container.

  • sac-base.sif is rebuilt so UID-1000’s passwd entry reads agent (not ubuntu); whoami matches $HOME.

D5 preflight (replaces D2’s empty-$HOME check).

# uid != 0, OR /proc/self/uid_map confirms userns-fakeroot.
if [ "$(id -u)" = "0" ]; then
  awk '$1==0 && $2!=0 {found=1} END {exit !found}' /proc/self/uid_map \
    || exit 11   # real root, refuse
fi
test "$HOME" = "/home/agent" || exit 12  # canonical HOME

Two static lines, attestable by sha256. The “no host leak” property falls out of --containall + canonical --home + declared binds:.

Rationale.

  • D4 stays but is no longer the only path. Bind destinations may live under /home/agent/... (intuitive) OR /srv/, /work/, /opt/, /data/ (container-canonical). Both pass D5.

  • fakeroot opt-in (apptainer.fakeroot: true) integrates: inside the container id -u == 0, but /proc/self/uid_map proves userns-mapping; the preflight accepts this without weakening the no-real-root guarantee.

Implementation (D5).

Layer

Where

Status

apptainer.fakeroot: bool

config/_types.py::ApptainerSpec

Auto-prepend --home /home/agent

runtimes/_apptainer_iso_flags.py

Auto-append --fakeroot when opted in

same

Preflight rewrite (uid-map + canonical-HOME)

runtimes/_apptainer_preflight.py

sac-base.sif: ubuntu → agent

containers/apptainer-base.def

✅ recipe; SIF rebuild pending

Bind destination validation

config/_parsers/_apptainer.py

Doc updates + AgentCard JSON example

docs/isolation.md

Network isolation addendum (2026-05-14).

Peer review pushed on --network=bridge, surfaced the A2A + MCP-over-loopback interop constraint, and converged on: stay with host netns for the Clew arXiv window; plan a --network=bridge + bridge-IF bind + sac-host /etc/hosts injection migration that preserves MCP URL stability. See docs/isolation.md §4.

References

  • docs/isolation.md — the 10-category leak catalog this ADR’s decisions close.

  • docs/spec-reference.mdspec.apptainer.relaxed, spec.apptainer.fakeroot.

  • The 2026-05-13 scitex-stats-auditor incident — session.jsonl at ~/.scitex/agent-container/runtime/scitex-stats-auditor/ for the full trace.