Agent Worktree Isolation — Runbook
Status: active — 2026-04-22.
Owner: devops-engineer.
Related: core#852, client#107, client#110, client#111, core#827.
Why this exists
Multiple agents (backend-sme, frontend-sme, qa-engineer,
principal-architect, etc.) are frequently dispatched in parallel on
the same repo. When they all operate on the shared main checkout
(/opt/upsquad/<repo>/), their git checkout, git stash, and
git clean operations silently clobber each other's uncommitted work.
Direct incidents:
| Incident | Cost | Note |
|---|---|---|
| client#107 (Clerk webhook) | multiple rebuilds | "checkouts got clobbered twice mid-build" |
| client#110 (dev-bypass removal) | ~90 minutes | untracked files dropped + tracked edits reverted when parallel branches flipped |
| client#111 (stub regen) | setup tax per agent | "concurrent bot sessions so explicit branch checkout + stash-by-id is required" |
| core#827 (conflict resolution) | forced workaround | agent created an ad-hoc worktree at /tmp/upsquad-761-merge |
Convention
Every agent task works in a dedicated git worktree at
/opt/upsquad-worktrees/<repo>/<agent>-<issue>/, branched from a
freshly fetched origin/main.
| Slot | Value |
|---|---|
| Root | /opt/upsquad-worktrees/<repo>/ — group-owned by upsquad-devs, mode 2775 (setgid) |
| Worktree dir | <agent>-<issue> |
| Branch | <prefix>/<issue>-<agent> (e.g. fix/844-devops-engineer) |
| Prefix | one of fix, feat, chore, docs, refactor, test |
| Base | origin/main at dispatch time |
One-time bootstrap (per dev box / CI runner)
sudo mkdir -p /opt/upsquad-worktrees/upsquad-core \
/opt/upsquad-worktrees/upsquad-client \
/opt/upsquad-worktrees/upsquad-admin \
/opt/upsquad-worktrees/upsquad-web
sudo chown -R "$USER":upsquad-devs /opt/upsquad-worktrees
sudo chmod 2775 /opt/upsquad-worktrees
sudo chmod 2775 /opt/upsquad-worktrees/*
The setgid bit on each directory makes new files/subdirs inherit the
upsquad-devs group so other agents (under any upsquad-devs user)
can read/write them.
Per-dispatch
# Agent prompts receive the issue number. The helper script returns the
# worktree path on stdout; all informational logs go to stderr.
WT=$(AGENT_NAME=backend-sme bash /opt/upsquad/upsquad-core/scripts/agent-worktree.sh 1234)
cd "$WT"
# work, commit, push ...
Script behaviour:
- Fetches
origin/mainfresh, branches from it. - If the worktree already exists (agent retry), it prints the path and exits 0 — agents can idempotently call it.
- If the branch already exists locally (previous attempt), it reuses the branch and the new worktree checks it out.
- Prints only the path on stdout; every other line goes to stderr.
Success criteria (from the issue)
- Two agents dispatched simultaneously on the same repo produce independent PRs with zero clobbering.
- Setup overhead under 5 seconds per dispatch.
- Documented convention in
CLAUDE.mdfor every repo.
Measured during core#852 delivery:
$ time (AGENT_NAME=test-b bash scripts/agent-worktree.sh 9999 test-b &
AGENT_NAME=test-c bash scripts/agent-worktree.sh 10000 test-c &
wait)
# ...
real 0m1.800s
1.8 s for two parallel worktrees, well under the 5 s SLA. Separate
AGENT_B_FILE / AGENT_C_FILE writes to each worktree confirmed
isolation — git status in each showed only that worktree's own
untracked file.
Cleanup
Worktrees persist after the agent finishes. This is intentional: if an agent task failed, the next agent picking it up should see the last state rather than a fresh checkout.
Weekly cleanup removes worktrees whose branch no longer exists on
origin (i.e. the PR merged or was closed):
# Run from the main checkout so the self-protection guard doesn't
# skip a worktree you're currently inside.
cd /opt/upsquad/upsquad-core
bash scripts/prune-agent-worktrees.sh
Suggested systemd timer (to be wired by a follow-up PR, not in scope for #852):
# /etc/systemd/system/upsquad-worktree-prune.timer
[Unit]
Description=Weekly prune of merged-branch agent worktrees
[Timer]
OnCalendar=Sun *-*-* 03:30:00
Persistent=true
[Install]
WantedBy=timers.target
Until that timer lands, DevOps runs the prune manually on ops days.
Relationship to Claude Code Agent SDK isolation: "worktree"
The SDK's built-in isolation: "worktree" param gives subagent tool
calls their own worktree for the duration of the call. It is
complementary, not a replacement, for the filesystem convention
documented here:
- SDK isolation covers subagent tool calls that modify the repo.
Agents SHOULD pass
isolation: "worktree"when spawning a subagent that mightgit checkoutor write files. - Filesystem convention covers human-invoked, cron-invoked, and top-level agent dispatches. Without it, any of those contexts can still clobber an SDK-isolated subagent's parent checkout.
When in doubt, prefer the filesystem convention — it works uniformly across every invocation path.
References
scripts/agent-worktree.sh— dispatch helper.scripts/prune-agent-worktrees.sh— weekly cleanup.CLAUDE.md→ "Parallel agent dispatches (MANDATORY)" section.- core#852 — the task that produced this convention.