Runbook: Org Model v2.3 — Phase 3 Legacy Drop (Dev)
Parent tracker: #570 · HLD: #556 · LLD: #560 §10.3 · Task issue: #694 · Milestone: Wave J+K — Phase 2 Cutover + Legacy Drop (#11)
This runbook covers the destructive, one-way migration that drops
the legacy Pillar/Team tables after the Phase 2 cutover
(docs/runbooks/org-v23-phase2-cutover.md) has settled. The
founder is the operator — the runbook does not automate the
migration on the founder's behalf.
0. Prod flip authority
Prod flip authority: TBD. Revisit this section when the first prod tenant onboards. Until then, every step below is dev only. Do not copy this runbook into a prod change ticket without replacing this section with the approved prod authority.
0.1 Pre-flight blockers — MUST be closed before migration 070 runs
PR #713 review round 4 identified three live-code surfaces that still
read legacy tables dropped by migration 070. Those surfaces remain on
the tier-1 structural allowlist of
scripts/ci/check-legacy-table-refs.sh because the PR does not modify
them, but they would 500 on every call once the tables are gone.
The migration does not run in any environment until every issue in the table below is closed. Each issue carries a lint+runtime fix; once closed, that surface is either flag-gated or removed and the risk it carries is retired.
| # | Issue | Surface | Risk if skipped |
|---|---|---|---|
| 1 | #733 | internal/mcp/github/scope.go::IsRepoAllowed | Resolved 2026-04-26. The dead PolicyScopeChecker, NewPolicyScopeChecker, and ScopeDBTX types (zero production callers, only the team_governance_policies query path) were deleted. AllowAllScopeChecker retained for tests. Issue closed. |
| 2 | #734 | internal/governance/{store,policy_store}.go legacy team_policy reads | Protected architecturally by the cascade_4layer breaker; if the breaker trips post-drop, engine.Check falls back to the legacy 3-layer path and 500s. 12 refs across the two files. |
| 3 | #735 | internal/runtime/approval/policy/source_pg.go::GetTeamPolicies | Called on every approval-policy resolution. 500s post-drop with no breaker in front. Requires a schema migration (adds template / timeout_seconds / escalation_minutes_before_deadline / channels / team_kind to org_unit_governance_policies) plus a data backfill, so carries a separate migration number. Shipped by migration 079_org_unit_policy_template.up.sql + flag-gate in PGSource.GetTeamPolicies. Verify read_from_org_units=true pins FROM org_unit_governance_policies before running §5. |
Verification before §1:
# Each must be CLOSED (not just merged into a branch).
for ISSUE in 733 734 735; do
STATE=$(gh api repos/upsquad-ai/upsquad-core/issues/"$ISSUE" --jq '.state')
echo "#${ISSUE}: ${STATE}"
[[ "$STATE" == "closed" ]] || { echo "STOP: #$ISSUE is still open."; exit 1; }
done
1. What this runbook covers
Migration 070_drop_legacy.up.sql (this PR) drops the four legacy
tables that are no longer referenced by any live read or write path:
| Table | Origin | Superseded by |
|---|---|---|
pillars | migration 001 | org_units where unit_type = 'pillar' (migration 060) |
teams | migration 001 | org_units where unit_type = 'team' (migration 060) |
team_memberships | migration 058 | org_unit_memberships (migration 060) |
team_governance_policies | migration 028 | 4-layer cascade policies (migration 063) |
The drop is destructive — once migration 070 has run, those
tables' rows are gone from the live database. The only way to
recover row-accurate pre-drop state is to restore from the pg_dump
backup captured in §4 below.
2. Who runs what
- Dev: the founder runs the migration personally via §5 below. Agents never run the migration on the founder's behalf. If an agent is asked to execute the drop against a live database, the request is out of scope and must be escalated back to the founder.
- Prod: TBD — see §0.
3. Pre-flight checks (before §5)
Run all of these before executing the migration. Any failure is a hard stop — fix and re-run from the top, do not attempt partial continuation.
3.1 Phase 2 cutover complete
The Phase 2 runbook (docs/runbooks/org-v23-phase2-cutover.md) must
have landed all four flags cleanly:
| Flag | Required state | Verification |
|---|---|---|
read_from_org_units | true | SELECT value FROM platform_feature_flags WHERE key = 'orgv23.read_from_org_units'; returns true |
cascade_4layer | true | SELECT value FROM platform_feature_flags WHERE key = 'orgv23.cascade_4layer'; returns true |
new_services_enabled | true | SELECT value FROM platform_feature_flags WHERE key = 'orgv23.new_services_enabled'; returns true |
dual_write_org_units | false | SELECT value FROM platform_feature_flags WHERE key = 'orgv23.dual_write_org_units'; returns false |
Each flip must have been observed stable for at least one dev-release-cycle (founder decision 2026-04-19 — the original LLD §10.3 six-week window is waived pre-prod).
3.2 Zero active legacy writes
Run one-off counts against the legacy tables to confirm the dual- write off-switch has held:
-- Expected: 0 new rows in the observation window.
SELECT table_name,
COUNT(*) FILTER (WHERE created_at > now() - interval '24 hours') AS created_24h
FROM (
SELECT 'pillars' AS table_name, created_at FROM pillars
UNION ALL
SELECT 'teams', created_at FROM teams
UNION ALL
SELECT 'team_memberships', created_at FROM team_memberships
UNION ALL
SELECT 'team_governance_policies', created_at FROM team_governance_policies
) AS s
GROUP BY table_name;
Any non-zero created_24h column means something is still writing
to a legacy table — stop and investigate before dropping.
3.3 Reconciler drift at zero
# Port-forward to the reconciler's metrics endpoint:
kubectl -n platform port-forward deploy/dualwrite-reconciler 9119:9119
# In another shell:
curl -s localhost:9119/metrics | grep '^dualwrite_open_drift{'
# Every line's value must be 0.
If any dualwrite_open_drift gauge is non-zero, run one more heal
cycle before proceeding. A stable-zero gate (over the observation
window) is mandatory.
3.4 CI lint sanity
# From the repo root:
scripts/ci/check-legacy-table-refs.sh origin/main
# Expected: "legacy-table-refs: OK — no new legacy-table references outside the allowlist."
This confirms no PR merged between Phase 2 and Phase 3 sneaked a new reference into the non-allowlisted code paths.
4. Backup (non-negotiable)
Take a full logical backup of the database before running the migration. The backup is the ONLY recovery path once §5 has executed.
# Namespace + pod names for dev (adjust for your env).
NS=platform
PG_POD=$(kubectl -n "$NS" get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}')
STAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR="/var/backups/upsquad/orgv23-phase3-drop-legacy"
mkdir -p "$BACKUP_DIR"
# 1. Full database dump (compressed). Takes a few minutes on dev.
kubectl -n "$NS" exec "$PG_POD" -- \
pg_dump -U upsquad -Fc -Z 9 upsquad \
> "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump"
# 2. Per-table dumps (smaller, targeted restore surface).
for TBL in pillars teams team_memberships team_governance_policies; do
kubectl -n "$NS" exec "$PG_POD" -- \
pg_dump -U upsquad -Fc --data-only --table="public.${TBL}" upsquad \
> "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done
# 3. SHA-256 manifest — record in the tracker #570 comment.
sha256sum "$BACKUP_DIR"/*.pgdump > "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"
cat "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"
Record the backup path + SHA manifest in #570 and #694 as a comment before moving to §5. Without this record there is no audit trail of the backup's existence.
4.1 Restore smoke test (optional but recommended)
Spin up a scratch database and confirm the backup is readable:
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
"CREATE DATABASE upsquad_backup_smoke;"
kubectl -n "$NS" exec -i "$PG_POD" -- \
pg_restore -U upsquad -d upsquad_backup_smoke -v \
< "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump" | tail -20
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad_backup_smoke -c \
"SELECT COUNT(*) FROM pillars; SELECT COUNT(*) FROM teams;
SELECT COUNT(*) FROM team_memberships;
SELECT COUNT(*) FROM team_governance_policies;"
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
"DROP DATABASE upsquad_backup_smoke;"
If the counts are non-zero and the restore completed without errors, the backup is trustworthy.
5. Execution
The migration is applied by golang-migrate, same tool the rest of
the platform uses. It wraps 070_drop_legacy.up.sql in a single
transaction.
# DATABASE_URL must point at the dev database; confirm twice.
echo "$DATABASE_URL" | grep -v prod || { echo "STOP: DATABASE_URL looks like prod"; exit 1; }
# Apply just the one migration — NOT `up` (that would be a no-op at
# this point, but best to be explicit).
make migrate-up
Expected output:
70/u drop_legacy (…ms)
If the migration errors (e.g. FK constraint surprise), the transaction rolls back and the legacy tables are still present. Investigate the error, correct, and retry. Do NOT patch the migration file in-place once it has succeeded on any other database — that breaks the single-source-of-truth invariant for migrations.
6. Post-verification
6.1 Tables gone, survivors intact
-- Expected: zero rows (legacy tables are gone).
SELECT tablename FROM pg_tables
WHERE schemaname = 'public'
AND tablename IN ('pillars','teams','team_memberships','team_governance_policies');
-- Expected: non-zero rows on every survivor.
SELECT 'organizations' AS t, COUNT(*) FROM organizations
UNION ALL
SELECT 'members', COUNT(*) FROM members
UNION ALL
SELECT 'org_units', COUNT(*) FROM org_units
UNION ALL
SELECT 'org_unit_memberships', COUNT(*) FROM org_unit_memberships
UNION ALL
SELECT 'agents', COUNT(*) FROM agents;
6.2 OrgUnitService parity
Hit the running Context Engine with an authenticated request and confirm the read path returns the same team surface as before the drop. The QA engineer's parity check (issue #694 acceptance) covers this; founder runs it once before closing the tracker:
# With SMOKE_GATEWAY_URL + SMOKE_GATEWAY_TOKEN + SMOKE_GATEWAY_ORG_ID set:
make smoke-wave-j
Expected: ok on all sub-tests.
6.3 Client + Admin portal smoke
Open the Org Chart page in the client portal and the Member Admin page in the admin portal. Verify:
- The full tree renders with no empty nodes.
- Member counts match the pre-drop snapshot (from §6.1 backup).
- No console errors referencing
pillars/teams/team_memberships.
Any regression is a rollback signal — see §7.
7. Rollback
The migration is one-way at the data level. Rollback requires restoring from the §4 backup:
# 1. Take the service down to prevent writes against the dropped
# schema.
kubectl -n platform scale deploy/context-engine --replicas=0
# 2. Run migrate down — recreates the schema (empty).
make migrate-down STEPS=1
# 3. Restore the four tables from the per-table backups.
for TBL in pillars teams team_memberships team_governance_policies; do
kubectl -n "$NS" exec -i "$PG_POD" -- \
pg_restore -U upsquad -d upsquad --table="public.${TBL}" --data-only \
< "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done
# 4. Verify counts match the pre-drop snapshot.
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad -c \
"SELECT 'pillars', COUNT(*) FROM pillars UNION ALL
SELECT 'teams', COUNT(*) FROM teams UNION ALL
SELECT 'team_memberships', COUNT(*) FROM team_memberships UNION ALL
SELECT 'team_governance_policies', COUNT(*) FROM team_governance_policies;"
# 5. Bring the service back up.
kubectl -n platform scale deploy/context-engine --replicas=1
Then revert the Phase 2 flags (read_from_org_units,
cascade_4layer, new_services_enabled → false; dual_write_org_units →
true) per the Phase 2 runbook §7 "rollback" procedure, and open a
tracking incident against #570 documenting the trigger for the
rollback.
DO NOT attempt to recover state by running migrate down alone
— that recreates empty tables with the right shape but no rows,
which looks healthy to a shallow check and breaks downstream reads.
8. Close-out
When §6 is green:
- Tick every acceptance box on #694.
- Comment on #570 with the backup path + SHA manifest from §4.
- Ping
@project-managerto close the Phase 3 milestone (#11). - QA writes the parity report and attaches it to #694.
At this point the Org Model v2.3 migration is code-complete in
dev. The only surviving references to pillars / teams /
team_memberships / team_governance_policies are:
- The migration files themselves (070's up + down, 001, 028, 058, 060 — history of the schema).
- The CI lint script (
scripts/ci/check-legacy-table-refs.sh) which stays in place to block any accidental reintroduction of the legacy identifiers. - The
.allowlistpaths in the lint script and theinternal/tenantshim +internal/member/read_switch*code paths that continue to compile but are no longer exercised at runtime. A future hygiene pass (out of scope for K.2) will remove them.