PRD: Context Engine -- Bootstrap Foundation for Token Optimization & Context Management

Status: Draft -- Awaiting Approval Version: 1.1 Author: Product Manager Agent Date: 2026-04-06 Parent PRD: UpSquad Complete PRD v1.6 Priority: [BOOTSTRAP] -- Must be built first; all other agents and platform components depend on it.

Changelog

v1.1 -- 2026-04-06

Triggered by: Internal re-verification against parent PRD (UpSquad Complete PRD v1.6) to close coverage gaps before architect handoff.

Added section 4.10 Embedding & Chunking Pipeline as an explicit first-class sub-component (CE-F48 through CE-F52). Previously implicit in RAG section.
Added CE-F53 LLM prompt-cache exploitation for stable immutable layer prefixes (L1+L2+L3+L4). Major token-reduction multiplier not in parent PRD -- flagged as upstream PRD gap in section 15.
Added CE-F54 Context Engine savings metric -- per-call delta between naive baseline and engine-assembled tokens, integrated with P2.2.8 per-call cost tracking.
Added CE-F55, CE-F56, CE-F57 mapping to P3.4.4, P3.4.5, P3.4.6 (persistent memory versioning with named snapshots, memory rollback, retention policies) -- previously claimed in appendix but not present in the requirements tables.
Added CE-F58 cross-reference to SEC.1.2 (immutable context enforcement validated before every LLM call) -- overlaps with CE-F29, made explicit.
Added section 8.1 Context Engine API Surface -- explicit interface contract between Context Engine and agent runtime, to remove ambiguity for the architect.
Strengthened quality-preservation constraint by embedding it in CE-F5, CE-F11, CE-F12 directly rather than relying only on the global constraint in section 2.
Added section 15 Upstream PRD Gaps -- items discovered during re-verification that should be added to the parent PRD by the Product Manager in a follow-up.
Expanded section 7 Dependencies with explicit provisioning prerequisites (pgvector extension, embedding model selection, compaction LLM routing).
Updated section 14 Parent PRD Cross-Reference appendix to accurately reflect which P3.4.x items are now covered and which were previously claimed-but-missing.

1. Problem Statement

UpsQuad is built by its own AI agent team. Every agent interaction consumes LLM tokens, and the cost and quality of those interactions is directly determined by how context is assembled, compressed, and delivered to the model.

Today, agents operate with naive context strategies: entire PRDs (45,000+ tokens), full conversation histories, and unfiltered tool outputs are sent to the LLM on every call. This creates three compounding problems:

Cost: Token consumption is 3-10x higher than necessary. A single PRD analysis session can consume hundreds of thousands of tokens when the agent only needs a fraction of the document.
Quality degradation: When context windows are packed with irrelevant information, LLMs lose focus. Critical details get buried in noise. The model attends to everything equally rather than focusing on what matters for the current task.
Latency: Larger prompts mean slower responses. Time-to-first-token increases linearly with input size.

The Context Engine is the foundational infrastructure that solves all three problems simultaneously. It ensures that every LLM call receives the most relevant context in the fewest tokens, preserving (and amplifying) quality while dramatically reducing cost and latency.

Why build this first? Every other component in the platform -- workflows, governance, agent runtime, the chat interface -- makes LLM calls. The Context Engine sits beneath all of them. Building it first means every subsequent component benefits from optimized context from day one. The agent team building UpsQuad will use it immediately (dogfooding), validating it under real production conditions before any customer does.

2. Goals & Success Metrics

Goal	Metric	Target (MVP)	Target (V1.1)
Reduce token consumption	Token reduction ratio (tokens sent vs. naive baseline for same task)	>= 40% reduction	>= 60% reduction
Preserve output quality	Quality score (blind comparison: Context Engine output vs. full-context output, rated by human reviewer)	>= 95% quality parity	>= 100% (quality amplification via better relevance)
Reduce latency	Time-to-first-token improvement	>= 25% faster	>= 40% faster
Context assembly speed	Time to assemble context for an LLM call (retrieval + compaction + assembly)	< 500ms (p95)	< 300ms (p95)
Cost visibility	Per-agent, per-session context efficiency tracking	Available in metrics	Dashboarded with alerts
Compaction quality	Information retention score (key facts preserved after compaction, measured via automated eval)	>= 98% fact retention	>= 99% fact retention
Semantic retrieval accuracy	Relevant chunk recall (% of relevant chunks retrieved in top-K results)	>= 85% recall@10	>= 92% recall@10

Critical constraint: Using the Context Engine must NEVER compromise the quality of agent work. If the system cannot determine relevance with confidence, it MUST err on the side of including more context, not less. Quality is non-negotiable; token savings are the optimization target only after quality is guaranteed.

3. User Stories

US-CE-1: Agent Receives Relevant Context for Current Task

As an AI agent executing a task, I want to receive only the context relevant to my current step, so that I can focus on the task at hand without being distracted by irrelevant information and without consuming unnecessary tokens.

Acceptance Criteria:

Context assembly retrieves chunks semantically relevant to the current task description
Irrelevant sections of large documents (e.g., a 45K-token PRD when the agent only needs 2 sections) are excluded
The agent's output quality is equal to or better than when given the full document
Token usage is measurably lower than the naive baseline

US-CE-2: Long Conversation Context is Compacted Without Loss

As an AI agent in a long-running workflow (50+ messages), I want older conversation context to be intelligently compacted, so that I retain the critical decisions and facts from earlier in the conversation without exceeding my context window.

Acceptance Criteria:

Sliding window compaction summarizes older messages while keeping recent messages in full
Key decisions, constraints, and facts from compacted messages are preserved as structured entries
The agent can still reference earlier decisions accurately after compaction
Compaction triggers automatically when context reaches 80% of the model's max tokens (configurable)

US-CE-3: Agent Pushes Session Knowledge to Persistent Memory

As an AI agent completing a task, I want to push the knowledge I have learned (decisions made, facts discovered, state changes) to a persistent memory store, so that future sessions can benefit from this knowledge without re-discovering it.

Acceptance Criteria:

/push-context command saves session-derived knowledge to the external memory store
Push does NOT modify the agent's role definition, guardrails, or system prompt
Push requires write permission to the agent's memory namespace
Pushed knowledge is versioned and retrievable by future sessions

US-CE-4: Agent Pulls Fresh Context Without Breaking Session

As an AI agent that needs updated context mid-session, I want to pull the latest context and create a new immutable session snapshot, so that I can operate on up-to-date information without partial state corruption.

Acceptance Criteria:

/pull-context fetches latest context from external DB
A new immutable session snapshot is created atomically
The previous snapshot is fully discarded -- no mixing of old and new state
The agent confirms the version transition (e.g., "Context pulled: snapshot v12 -> v13")

US-CE-5: Context is Versioned and Rollbackable

As a team lead (L3+), I want every context mutation to be versioned with content-addressed hashing, so that I can audit changes, compare versions, and roll back to a known-good state if an agent's context becomes corrupted.

Acceptance Criteria:

Every context mutation produces a new version with SHA-256 content hash
Named snapshots can be created manually or automatically (before critical operations)
Rollback to any previous version is possible (L3+ required)
Full audit trail: who changed what, when, linked to workflow step

US-CE-6: Platform Operator Sees Context Efficiency Metrics

As a platform operator, I want to see per-agent and per-tenant context efficiency metrics, so that I can identify agents that are consuming tokens inefficiently and tune their context strategies.

Acceptance Criteria:

Context efficiency score is calculated: (useful tokens / total tokens sent) per LLM call
Metrics are available per agent, per session, per tenant
Alerts fire when an agent consistently operates below efficiency threshold
Recommendations for better compaction/retrieval settings are surfaced (V1.1)

US-CE-7: Immutable Agent Context Cannot Be Self-Modified

As a platform security engineer, I want to ensure that no agent can modify its own role definition, system prompt, guardrails, or clearance level, so that the chain of trust is preserved and agents cannot escalate their own privileges.

Acceptance Criteria:

All self-modification attempts are blocked regardless of vector (DB write, API call, tool invocation, LLM instruction)
Self-modification attempts are logged as security events
Optional session termination on violation
Only parent agents or humans can edit an agent's context (downward edits only)

US-CE-8: Semantic Search Retrieves Relevant Knowledge Before Each LLM Call

As an AI agent about to make an LLM call, I want the system to automatically assemble the most relevant context from my memory, the conversation history, and the team's RAG knowledge bases, so that I have the best possible information for my task without manual curation.

Acceptance Criteria:

Semantic search runs against vector store with recency weighting before each LLM call
Results are ranked by relevance score and recency
Only chunks above a relevance threshold are included
Context assembly respects the model's token budget (never exceeds allocated window)
Retrieval + assembly completes in < 500ms (p95)

4. Functional Requirements

4.1 Context Ingestion & Storage

ID	Requirement	Release	Parent PRD Item
CE-F1	Context ingestion layer: ingest messages, tool outputs, workflow events into per-agent context store in database	MVP	P3.1.1
CE-F2	Per-agent, per-tenant context isolation (separate storage paths, no cross-tenant context leakage)	MVP	P3.4.3
CE-F3	Memory types: conversation context (per-workflow), learned knowledge (persistent), team preferences (persistent), work artifacts (90-day retention)	MVP	P3.4.2
CE-F4	Object storage backend with per-tenant prefix, per-agent subfolder, object versioning enabled	MVP	P3.4.1

4.2 Context Compaction

ID	Requirement	Release	Parent PRD Item
CE-F5	Sliding Window compaction (default strategy): recent N messages in full, older messages summarized via LLM. Quality guardrail: summarization prompts MUST preserve decisions, commitments, constraints, and open questions verbatim or as structured key-facts. If summarization confidence is low, keep the original message in full. Never silently drop information.	MVP	P3.1.2
CE-F6	Compaction trigger: configurable threshold, default 80% of model max tokens	MVP	P3.1.6
CE-F7	Per-agent/per-team configuration of compaction strategy and thresholds (via config DB)	MVP	P3.1.7
CE-F8	Key-Fact Extraction compaction: extract decisions, constraints, requirements as structured facts	V1.1	P3.1.3
CE-F9	Hierarchical Summary compaction: minute-to-hour-to-day-to-week summaries	V1.1	P3.1.4
CE-F10	Task-Scoped compaction: completed tasks compacted, active tasks kept in full	V1.1	P3.1.5

4.3 Smart Retrieval & Assembly

ID	Requirement	Release	Parent PRD Item
CE-F11	Smart retrieval: semantic search via vector store + recency weighting for context assembly before each LLM call. Quality guardrail: if top-K semantic recall drops below the confidence threshold for a query, the engine MUST expand the retrieval set (increase K, lower relevance floor) rather than return a thin result. Erring toward more context is always preferred over missing information.	MVP	P3.1.9
CE-F12	Relevance scoring: chunks scored by semantic similarity to current task + recency + importance weight. Quality guardrail: scoring is advisory, not exclusive -- a chunk tagged as a "pinned" constraint (decisions, commitments, immutable facts) is always included regardless of score.	MVP	New (implicit in P3.1.9)
CE-F13	Token budget management: context assembly respects model's max token limit, prioritizing highest-relevance chunks	MVP	New (operational necessity)
CE-F14	Context processing: entity extraction, reference resolution, importance scoring	V1.1	P3.1.10
CE-F15	Context overload detection and alerting (alert when context approaches limit inefficiently)	V1.1	P3.1.8

4.4 Context Push/Pull & Session Management

ID	Requirement	Release	Parent PRD Item
CE-F16	Context refresh mechanism: explicit refresh creates new immutable session snapshot, atomic operation (old or new, never partial)	MVP	P3.1.11
CE-F17	Context push operation (/push-context): saves session knowledge to external memory store. Does NOT modify role definition or guardrails.	MVP	P3.1.12
CE-F18	Context pull operation (/pull-context): fetches latest context, creates new immutable session snapshot, discards previous snapshot	MVP	P3.1.13
CE-F19	Context push/pull authorization model: push requires write permission to own memory namespace; pull always authorized for own context; neither can modify role/guardrails/system prompt	MVP	P3.1.14

4.5 Context Version Control

ID	Requirement	Release	Parent PRD Item
CE-F20	Auto-versioning on every context mutation (content-addressed SHA-256 hash, stored in context_versions table)	MVP	P3.2.1
CE-F21	Named snapshots (before critical operations, per workflow step, manually triggered by L3+)	MVP	P3.2.2
CE-F22	Context rollback (restore any previous version, L3+ required, recorded in audit log)	MVP	P3.2.4
CE-F23	Full audit trail (who changed what, when, why -- linked to workflow step)	MVP	P3.2.7
CE-F24	Context diffing (compare any two versions, show added/removed/modified entries)	V1.1	P3.2.3
CE-F25	Context branching (parallel exploration with different context)	V2	P3.2.5
CE-F26	Context merging (combine branches with conflict resolution, human review for conflicts)	V2	P3.2.6

4.6 Immutable Agent Context & Chain of Trust

ID	Requirement	Release	Parent PRD Item
CE-F27	4-layer enforcement: Platform Rules (hardcoded) > Tenant Policies (L5) > Team Guidelines (L3+) > Agent Persona (set at creation)	MVP	P3.3.1
CE-F28	Layers injected as system prompt prefix before every LLM interaction (concatenated L1+L2+L3+L4)	MVP	P3.3.2
CE-F29	Validation layer: blocks runtime override attempts via prompt injection detection	MVP	P3.3.3
CE-F30	Violation logging and agent termination on immutable context violation	MVP	P3.3.4
CE-F31	Self-context immutability: agents cannot modify own role definition, system prompt, guardrails, or clearance level	MVP	P3.3.8
CE-F32	Hierarchical context editing: only parent agent or human can edit, downward only	MVP	P3.3.9
CE-F33	Agent hierarchy definition: tree structure for chain of trust, configurable per tenant	MVP	P3.3.10
CE-F34	Chain of trust enforcement: validates all context edits against hierarchy before applying	MVP	P3.3.11
CE-F35	Guardrail definition format: structured rules with scope, condition, prohibited action, violation response	MVP	P3.3.7
CE-F36	Blacklist-based action model: default allow, guardrails define what agents CANNOT do	MVP	P3.3.6
CE-F37	Runtime guardrail enforcement at MCP transport layer [BOOTSTRAP]	MVP	P3.3.12

4.7 RAG Knowledge Bases (Context Engine Integration Points)

ID	Requirement	Release	Parent PRD Item
CE-F38	Vector store per knowledge domain (per tenant schema)	MVP	P3.5.1
CE-F39	Domain knowledge RAG: tenant uploads documents, auto-chunked, embedded, indexed	MVP	P3.5.2
CE-F40	Clearance-gated access: which agents can access which knowledge bases, per RBAC	MVP	P3.5.4
CE-F41	MCP-exposed endpoints: agents access knowledge via MCP tools (search, get_context, find_similar)	MVP	P3.5.5
CE-F42	Cross-tenant contamination prevention: RAG queries always scoped to tenant	MVP	P3.5.8
CE-F43	Update triggers: webhooks, doc updates, configurable schedule	MVP	P3.5.6

4.8 Context-Aware Tool Loading

ID	Requirement	Release	Parent PRD Item
CE-F44	Context-mode MCP sandboxing [BOOTSTRAP]: dynamically load/unload MCP tool definitions based on current task step. Only tools relevant to active operation injected into context window. Reduces idle tool definition context consumption.	MVP	P2.4.16

4.9 Context Efficiency Metrics

ID	Requirement	Release	Parent PRD Item
CE-F45	Context efficiency score: tokens used vs. context window capacity, per agent, per session	V1.1	P7.2.4
CE-F46	Flag context overload: alert when agent approaching token limits inefficiently	V1.1	P8.1.3
CE-F47	Suggest better model/compaction/learning settings based on efficiency analysis	V1.1	P8.1.2
CE-F54	Context Engine savings metric: for every LLM call, record (naive_baseline_tokens, engine_assembled_tokens, savings_ratio). Aggregated per agent, per session, per tenant. Integrated with per-call cost tracking so savings are visible in billing dashboards and Platform Owner Console.	MVP	New (integrates P2.2.8, P7.1.1)

4.10 Embedding & Chunking Pipeline

This section makes the embedding pipeline a first-class sub-component of the Context Engine. Previously implicit in the RAG section, it is called out here because it is on the critical path for semantic retrieval quality and because embedding model selection has long-term migration consequences.

ID	Requirement	Release	Parent PRD Item
CE-F48	Chunking service: token-aware chunking with configurable chunk size and overlap. Default: 512 tokens per chunk with 64-token overlap. Respects structural boundaries (headings, paragraphs, code blocks) when possible. Per-document-type strategies (markdown, PDF, code, transcript).	MVP	Implicit in P3.5.2
CE-F49	Embedding service: unified interface for producing embeddings from text. Initial MVP provider: a single approved embedding model (e.g., text-embedding-3-small or equivalent open model) selected by the architect. Per-tenant allowlisting of embedding models in V1.1.	MVP	Implicit in P3.5.2
CE-F50	Embedding cache: deduplicate embedding calls by content hash. If the same chunk content is embedded twice (same tenant or across tenants where content is non-sensitive platform data), reuse the cached embedding.	MVP	New (cost optimization)
CE-F51	Embedding model versioning and dual-index migration: when the embedding model changes, maintain both old and new indices in parallel, gradually migrate queries to the new index, validate recall@10 parity, drop the old index only after validation. No query outage during migration.	V1.1	Open Question #4 resolution
CE-F52	Reindexing job: background job that reprocesses source documents when (a) chunking strategy changes, (b) embedding model is upgraded, (c) index corruption is detected. Per-tenant scoped, rate-limited, resumable.	MVP	New (operational necessity)

4.11 Prompt Cache Exploitation (Token Optimization Multiplier)

ID	Requirement	Release	Parent PRD Item
CE-F53	LLM prompt-cache exploitation: when the LLM router supports prompt caching (Anthropic, OpenAI), the Context Engine MUST structure every assembled prompt so that the stable prefix (L1 Platform Rules + L2 Tenant Policies + L3 Team Guidelines + L4 Agent Persona + pinned knowledge base chunks) is placed at the front of the prompt in a cache-friendly order. Unstable tail (current task, retrieved chunks, recent conversation) goes after the cacheable prefix. This is a major token-cost multiplier because immutable layers are injected on every call (per CE-F28) and are highly cacheable across calls in the same session.	MVP	New -- flagged to parent PRD in section 15

4.12 Persistent Memory Versioning & Retention

These items were claimed as "covered" in the v1.0 appendix but were missing from the requirements tables. Added here for completeness.

ID	Requirement	Release	Parent PRD Item
CE-F55	Persistent memory versioning with named snapshots (before major learning cycles, configurable per memory type)	MVP	P3.4.4
CE-F56	Persistent memory rollback to any version (L4+ required, logged to audit trail)	V1.1	P3.4.5
CE-F57	Configurable memory retention policies (30/90/365 days/indefinite, per memory type, per tenant)	V1.1	P3.4.6

4.13 Immutable Context Enforcement Integration

ID	Requirement	Release	Parent PRD Item
CE-F58	Immutable context enforcement before every LLM call: the assembled prompt is validated against the 4-layer immutable context before dispatch to the LLM router. If validation fails (tampering, injection, layer missing), the call is blocked and logged as a security event. This is the Context Engine's concrete implementation of SEC.1.2, and subsumes CE-F29 (prompt injection detection).	MVP	SEC.1.2 + P3.3.3

5. Non-Functional Requirements

Category	Requirement	Target
Performance	Context assembly (retrieval + compaction + assembly)	< 500ms p95
Performance	Semantic search over 1M context entries	< 500ms p95 (V1.1 benchmark)
Performance	Compaction execution (single agent session)	< 2s for sessions up to 100K tokens
Scalability	Concurrent context assemblies	1,000 concurrent (for 1,000 tenants)
Scalability	Vector store entries per tenant	10M entries without degradation
Reliability	Context version integrity	Zero data loss on versioned context (content-addressed hashing)
Reliability	Push/pull atomicity	Operations are atomic -- never partial state
Security	Tenant isolation	Zero cross-tenant context leakage. Every query scoped to tenant_id.
Security	Immutable context enforcement	System prompt validated before every LLM call
Security	Self-modification prevention	All vectors blocked (DB, API, tool, LLM instruction)
Observability	Per-call token usage tracking	Every LLM call metered: model, input_tokens, output_tokens, context_engine_savings
Quality	Compaction information retention	>= 98% key fact retention (measured by automated eval)
Quality	Retrieval relevance	>= 85% recall@10 for relevant chunks

6. Scope

In Scope (MVP)

Context ingestion layer (messages, tool outputs, workflow events)
Sliding Window compaction with configurable thresholds
Smart retrieval with semantic search and recency weighting
Token budget management during context assembly
Context push/pull operations with authorization
Context versioning with content-addressed hashing
Named snapshots and rollback
4-layer immutable agent context enforcement
Hierarchical context editing (chain of trust)
Self-modification prevention
Guardrail enforcement at MCP transport layer
RAG vector store with tenant isolation
Document ingestion, chunking, embedding pipeline
MCP-exposed context access endpoints
Context-mode MCP sandboxing (dynamic tool loading)
Per-agent context storage isolation

In Scope (V1.1)

Key-Fact Extraction compaction
Hierarchical Summary compaction
Task-Scoped compaction
Context processing (entity extraction, reference resolution, importance scoring)
Context diffing between versions
Context overload detection and alerting
Context efficiency score dashboard
Optimization recommendations engine
Specialized RAG domains (per-tenant configurable)
Industry knowledge base templates

Out of Scope (V2+)

Context branching (parallel exploration)
Context merging with conflict resolution
Cross-agent context sharing without explicit push/pull
Training embeddings on tenant data (privacy implications -- deferred)
S3-compatible backend for on-premise

Explicitly Out of Scope (Not This PRD)

Workflow engine (P4) -- consumes context, does not produce it
Chat interface (P6) -- presentation layer for context operations
Cost engine billing aggregation (P7) -- consumes context metrics
Agent executor process (P2.1) -- integrates with context engine but is a separate component
User-facing configuration UIs -- separate frontend PRD

7. Dependencies

Dependency	Direction	Description
Agent Runtime (P2.1)	Bidirectional	Context Engine provides context to agents; agents produce context events. Agent session initialization (P2.1.16) fetches context from this engine.
Config DB (P4.6)	Inbound	Compaction strategies, thresholds, guardrail definitions stored in config DB
LLM Router (P2.2)	Outbound	Compaction uses LLM calls (summarization). Context Engine must route these via the LLM router.
MCP Framework (P2.4)	Bidirectional	Context Engine exposes MCP endpoints; MCP transport layer enforces guardrails
RBAC / Clearance Engine (P4.5)	Inbound	Authorization for push/pull/rollback operations checked against RBAC
Metering (P2.1.19)	Outbound	Context Engine emits token usage and efficiency metrics to metering pipeline
Vector Store Infrastructure (P1.2.9)	Inbound	Provisioning prerequisite: pgvector extension enabled on the primary relational DB with per-tenant schema isolation. Must exist before Context Engine deployment.
Object Storage (P1.2.8)	Inbound	Persistent agent memory stored in object storage (per-tenant prefix, per-agent subfolder, versioning enabled).
Cache (P1.2.7)	Inbound	Hot context cached for fast retrieval. Cache never source-of-truth for pull operations.
Embedding Model Selection	Inbound	Architect decision required before MVP build: which embedding model is the MVP default (text-embedding-3-small, bge-large, or equivalent). Model choice affects recall targets and migration cost.
Compaction LLM Routing (P2.2)	Outbound	Compaction uses LLM calls for summarization. Architect must decide: dedicated lightweight model vs. agent's configured model. See Open Question #3.
Security (SEC.1.1, SEC.1.2)	Inbound/Outbound	Input sanitization (SEC.1.1) runs before context assembly; immutable context enforcement (SEC.1.2) runs after assembly, before LLM dispatch. Context Engine owns SEC.1.2 implementation.
Per-call Cost Tracking (P2.2.8, P7.1.1)	Outbound	Every LLM call emits tokens + engine savings delta to metering pipeline.

8. Architecture Context

The Context Engine is a core platform service in UpsQuad's architecture:

+------------------------------------------------------------------+
| Agent Runtime: Executor | LLM Router | Tool Registry (MCP)       |
|              |                    |                    |          |
|              v                    v                    v          |
+------------------------------------------------------------------+
| >>>>>> CONTEXT ENGINE <<<<<<                                     |
|   Context Assembly | Compaction | Smart Retrieval | Versioning   |
|   Immutable Context Enforcement | Guardrail Engine               |
|   Push/Pull Manager | RAG Integration | Token Budget Manager     |
+------------------------------------------------------------------+
|              |                    |                    |          |
|              v                    v                    v          |
+------------------------------------------------------------------+
| Data Layer: Relational DB + Vector Store | Cache | Object Storage |
+------------------------------------------------------------------+

Every LLM call flows through the Context Engine. The engine sits between the agent runtime (which decides what to do) and the data layer (which stores everything). It is the "lens" that focuses the right information into each LLM call.

8.1 Context Engine API Surface (Interface Contract)

The Context Engine exposes the following logical operations to the agent runtime. This is a product-level contract; the architect owns the concrete gRPC/internal API design.

Operation	Caller	Purpose
`AssembleContext(session_id, task_description, token_budget)`	Agent runtime (before every LLM call)	Returns the assembled prompt (immutable layers + compacted conversation + retrieved chunks + current task), guaranteed to fit within token_budget and to satisfy quality guardrails.
`IngestEvent(session_id, event)`	Agent runtime (after every tool call or message)	Appends a context event to the per-agent store with auto-versioning.
`Compact(session_id, strategy?)`	Agent runtime (on threshold breach)	Applies the configured compaction strategy. Returns new version hash.
`PushContext(agent_id, knowledge)`	Agent runtime (on /push-context)	Persists session-derived knowledge to the agent's memory namespace. Authorization-checked.
`PullContext(agent_id)`	Agent runtime (on /pull-context or critical-update signal)	Returns the latest context snapshot (role definition, guardrails, config, memory). Creates a new immutable session snapshot atomically.
`RollbackContext(context_id, target_version)`	User (L3+) or platform operator	Restores a previous version. Audit-logged.
`ValidatePrompt(assembled_prompt)`	Context Engine (internal, before dispatch)	Validates the assembled prompt against the 4-layer immutable context and SEC.1.2. Blocks on violation.
`EmitMetrics(session_id, call_metrics)`	Context Engine (after every LLM call)	Emits naive_baseline_tokens, engine_assembled_tokens, savings_ratio, latency, quality-eval score to metering.
`Search(tenant_id, query, domain?, top_k?)`	Agent runtime (via MCP tool)	Semantic search against RAG knowledge bases. Tenant-scoped.
`Embed(text)`	Internal (chunking pipeline, search)	Produces an embedding using the configured model. Cached by content hash.

All operations are tenant-scoped. All operations return structured errors with stable error codes. All operations are metered.

9. Edge Cases & Failure Modes

Scenario	Expected Behavior
Compaction LLM call fails (provider down)	Fall back to truncation strategy (keep recent N messages, drop oldest). Log degraded-mode event. Never block the agent.
Semantic search returns zero results	Fall back to recency-based retrieval (most recent context entries). Log low-relevance event.
Context push fails mid-write	Atomic write -- either fully committed or fully rolled back. Agent retries on next push.
Context pull returns stale data (cache lag)	Pull always reads from primary DB, never cache. Cache is only for assembly-time reads.
Agent attempts to modify own system prompt	Blocked immediately. Logged as security event. Optional session termination per configuration.
Parent agent edits child context while child is mid-session	Child continues on its frozen snapshot. New context takes effect on next session or explicit /pull-context.
Two agents push to same memory namespace simultaneously	Optimistic concurrency with version check. Second push fails with conflict, must retry with latest version.
Compaction produces a summary that loses a critical fact	Quality eval catches this in V1.1. MVP mitigation: compaction always preserves structured key-facts alongside summary. Human review available for critical workflows.
Vector store index corruption	Automatic reindex from source documents. Alert platform operator. Serve from raw document search (degraded mode) during reindex.
Context exceeds model max tokens even after compaction	Hard truncation with priority ordering: (1) immutable layers, (2) current task context, (3) relevant retrieved chunks, (4) recent conversation, (5) older summaries. Never exceed model limit.
Cross-tenant query attempted	Rejected at query layer. Every context query requires tenant_id. Queries without tenant_id are invalid. Automated isolation tests verify this.

10. Open Questions

#	Question	Impact	Proposed Default
1	Should compaction summaries be stored alongside originals, or replace them?	Storage cost vs. audit trail	Store alongside (originals are append-only, summaries are derived views)
2	What is the right default relevance threshold for semantic search inclusion?	Quality vs. token savings	0.7 cosine similarity, tunable per agent
3	Should the Context Engine have its own dedicated LLM for compaction, or share the agent's configured model?	Cost and quality	Dedicated lightweight model for compaction (cheaper, faster); agent's model for task work
4	How should the system handle embedding model upgrades (existing vectors become incompatible)?	Migration complexity	Dual-index strategy: old + new index, gradual migration, drop old after validation
5	Should context efficiency metrics count immutable layer tokens as "overhead" or "useful"?	Metric accuracy	Count as "required overhead" -- separate from "task context efficiency"

11. Pricing Tier Mapping

Capability	Free	Pro	Enterprise
Sliding Window compaction	Yes	Yes	Yes
Smart retrieval (semantic search)	Basic (top-5)	Full (top-20, tunable)	Full + custom models
Context versioning	Last 10 versions	Last 100 versions	Unlimited
Named snapshots	3 per agent	50 per agent	Unlimited
Context rollback	No	Yes (L3+)	Yes (L3+)
RAG knowledge bases	1 domain, 1GB	10 domains, 50GB	Unlimited
Context efficiency dashboard	Basic	Full	Full + alerts + recommendations
Custom compaction strategies	No	No	Yes
Guardrail customization	Platform defaults only	Team-level	Full 4-layer

12. MVP vs V1.1 Summary

MVP (Build First -- Agents Use Immediately)

38 functional requirements. These provide the core context optimization loop:

Ingest context events (messages, tool outputs, workflow events)
Compact when approaching token limits (sliding window, with quality guardrails)
Retrieve semantically relevant context for each LLM call (with quality guardrails)
Manage token budget -- never exceed model limits, prioritize relevance
Version every mutation with content-addressed hashing
Enforce immutability -- 4-layer context, chain of trust, self-modification prevention, SEC.1.2 validation before every LLM call
Push/Pull for controlled context updates across sessions
Integrate with RAG for knowledge retrieval
Sandbox MCP tools -- only load relevant tools per task step
Embed and chunk source documents via the dedicated embedding pipeline with content-hash caching
Exploit LLM prompt caches -- structure prompts so immutable prefixes are cache-friendly, unstable tails come last
Measure savings -- emit naive-baseline vs. engine-assembled token deltas per call into metering
Version persistent memory with named snapshots (P3.4.4)

V1.1 (Enhanced Optimization)

14 functional requirements adding advanced compaction strategies, context processing, efficiency metrics, optimization recommendations, embedding model migration, memory rollback, and retention policies.

13. Dogfooding Validation Plan

The Context Engine will be validated by UpsQuad's own agent team before any customer deployment:

Validation Step	How
Token reduction measurement	Compare token usage for identical tasks (e.g., PRD analysis) with and without Context Engine
Quality parity check	Human blind review: does the agent produce equal or better output with Context Engine?
Compaction accuracy	Run 50 compaction operations, verify key-fact retention via automated eval
Semantic retrieval relevance	Query test suite: 100 queries against known-relevant documents, measure recall@10
Push/pull correctness	Agent team performs real push/pull operations during development workflow
Immutability enforcement	Red-team test: attempt self-modification via all vectors, verify all are blocked
Cross-tenant isolation	Automated test: create two tenants, verify zero data leakage across all operations
Failure mode resilience	Inject failures (LLM down, DB slow, cache miss) and verify graceful degradation

14. Appendix: Parent PRD Item Cross-Reference

Every functional requirement in this PRD traces back to the parent PRD (UpSquad Complete PRD v1.6) or is a net-new item flagged for upstream inclusion (see section 15).

Fully covered by this subset PRD:

P3.1.1 through P3.1.14 (Context Engine section) -- all items, MVP and V1.1
P3.2.1 through P3.2.7 (Context Version Control) -- all items, MVP through V2
P3.3.1 through P3.3.12 (Immutable Agent Context) -- all items
P3.4.1, P3.4.2, P3.4.3 (Persistent Agent Memory -- storage, types, isolation) -- MVP
P3.4.4 (Persistent memory versioning with named snapshots) -- MVP, added in v1.1 of this PRD (CE-F55)
P3.4.5 (Persistent memory rollback L4+) -- V1.1, added in v1.1 of this PRD (CE-F56)
P3.4.6 (Memory retention policies) -- V1.1, added in v1.1 of this PRD (CE-F57)
P3.5.1, P3.5.2, P3.5.4, P3.5.5, P3.5.6, P3.5.8, P3.5.9 (RAG Knowledge Bases, MVP items)
P2.4.16 (Context-mode MCP sandboxing)
P7.2.4 (Context efficiency score)
P8.1.2, P8.1.3 (Optimization suggestions, context overload flagging)
SEC.1.2 (Immutable context enforcement before every LLM call) -- owned here as CE-F58

Deferred to V2 (out of scope for this PRD's MVP):

P3.4.7 (S3-compatible backend for on-premise)
P3.2.5, P3.2.6 (Context branching and merging)
P3.5.3, P3.5.7, P3.5.10 (Specialized RAG domains, dedup, industry templates)

Items from the parent PRD that interact with but are not owned by this PRD:

P2.1.16 (Agent session initialization) -- consumes context engine, owned by Agent Runtime
P2.2.x (LLM Router) -- used by compaction, owned by Agent Runtime
P2.2.8 (Per-call cost tracking) -- the engine emits savings deltas into this pipeline
P4.5.x (RBAC) -- authorization provider, owned by Governance
P6.1.10 (Slash commands /push-context, /pull-context) -- UI layer, owned by Chat Interface
P7.1.x (Cost tracking) -- consumes context metrics, owned by Cost Engine
SEC.1.1 (Input sanitization) -- runs before context assembly, owned by Security Hardening

15. Upstream PRD Gaps (Flagged for Parent PRD v1.7)

During re-verification of this subset PRD against the parent PRD, the following gaps were discovered in the parent PRD itself. The Product Manager will address these in a follow-up update to UpSquad Complete PRD.

#	Gap	Proposed Parent PRD Item	Priority
1	LLM prompt-cache exploitation is a major token-reduction multiplier not explicitly called out in P3.1 or P2.2. Immutable layers L1+L2+L3+L4 (per P3.3.2) are injected on every LLM call -- they should be structured cache-friendly so that Anthropic/OpenAI/Gemini prompt caches can elide them. This can reduce effective input tokens by 70-90% for subsequent calls in a session.	New P3.1.15 -- "LLM prompt-cache exploitation: assembled prompts must place immutable prefix (L1-L4 + pinned chunks) at the front in cache-friendly order; unstable tail (task, retrieved chunks, recent conversation) follows. Router coordinates with providers that support prompt caching."	MVP [BOOTSTRAP]
2	Context Engine savings metric -- no explicit requirement in P7.2 for tracking the delta between naive-baseline tokens and engine-assembled tokens. Without this metric, we cannot prove the engine is working or justify its complexity.	New P7.2.5 -- "Context Engine savings metric: per LLM call, record (naive_baseline_tokens, engine_assembled_tokens, savings_ratio, quality_score). Aggregated and dashboarded per agent, session, tenant."	MVP
3	Embedding pipeline as first-class sub-component -- parent PRD mentions "auto-chunked, embedded, and indexed" in P3.5.2 but does not enumerate chunking strategy, embedding service, embedding cache, or embedding model migration as distinct items.	New P3.5.11 through P3.5.14 covering chunking service, embedding service, embedding cache, embedding model migration	MVP / V1.1
4	Compaction quality retention target -- no non-functional requirement specifying minimum key-fact retention for compaction. Without this target, compaction quality is unverifiable.	New P3.1.16 -- "Compaction quality: >= 98% key-fact retention measured by automated eval suite. Compaction strategies must not silently drop decisions, commitments, constraints, or open questions."	MVP
5	Pinned context -- no explicit concept of "pinned" chunks that are always included in assembly regardless of semantic score. Without this, a low-similarity but critical constraint can be excluded.	New P3.1.17 -- "Pinned context: any chunk tagged as a hard constraint (decisions, commitments, immutable facts) is always included in assembly regardless of relevance score."	MVP

These will be added to the parent PRD in a follow-up update that bumps UpSquad Complete PRD to v1.7.

Changelog​

v1.1 -- 2026-04-06​

1. Problem Statement​

2. Goals & Success Metrics​

3. User Stories​

US-CE-1: Agent Receives Relevant Context for Current Task​

US-CE-2: Long Conversation Context is Compacted Without Loss​

US-CE-3: Agent Pushes Session Knowledge to Persistent Memory​

US-CE-4: Agent Pulls Fresh Context Without Breaking Session​

US-CE-5: Context is Versioned and Rollbackable​

US-CE-6: Platform Operator Sees Context Efficiency Metrics​

US-CE-7: Immutable Agent Context Cannot Be Self-Modified​

US-CE-8: Semantic Search Retrieves Relevant Knowledge Before Each LLM Call​

4. Functional Requirements​

4.1 Context Ingestion & Storage​

4.2 Context Compaction​

4.3 Smart Retrieval & Assembly​

4.4 Context Push/Pull & Session Management​

4.5 Context Version Control​

4.6 Immutable Agent Context & Chain of Trust​

4.7 RAG Knowledge Bases (Context Engine Integration Points)​

4.8 Context-Aware Tool Loading​

4.9 Context Efficiency Metrics​

4.10 Embedding & Chunking Pipeline​

4.11 Prompt Cache Exploitation (Token Optimization Multiplier)​

4.12 Persistent Memory Versioning & Retention​

4.13 Immutable Context Enforcement Integration​

5. Non-Functional Requirements​

6. Scope​

In Scope (MVP)​

In Scope (V1.1)​

Out of Scope (V2+)​

Explicitly Out of Scope (Not This PRD)​

7. Dependencies​

8. Architecture Context​

8.1 Context Engine API Surface (Interface Contract)​

9. Edge Cases & Failure Modes​

10. Open Questions​

11. Pricing Tier Mapping​

12. MVP vs V1.1 Summary​

MVP (Build First -- Agents Use Immediately)​

V1.1 (Enhanced Optimization)​

13. Dogfooding Validation Plan​

14. Appendix: Parent PRD Item Cross-Reference​

15. Upstream PRD Gaps (Flagged for Parent PRD v1.7)​

Changelog

v1.1 -- 2026-04-06

1. Problem Statement

2. Goals & Success Metrics

3. User Stories

US-CE-1: Agent Receives Relevant Context for Current Task

US-CE-2: Long Conversation Context is Compacted Without Loss

US-CE-3: Agent Pushes Session Knowledge to Persistent Memory

US-CE-4: Agent Pulls Fresh Context Without Breaking Session

US-CE-5: Context is Versioned and Rollbackable

US-CE-6: Platform Operator Sees Context Efficiency Metrics

US-CE-7: Immutable Agent Context Cannot Be Self-Modified

US-CE-8: Semantic Search Retrieves Relevant Knowledge Before Each LLM Call

4. Functional Requirements

4.1 Context Ingestion & Storage

4.2 Context Compaction

4.3 Smart Retrieval & Assembly

4.4 Context Push/Pull & Session Management

4.5 Context Version Control

4.6 Immutable Agent Context & Chain of Trust

4.7 RAG Knowledge Bases (Context Engine Integration Points)

4.8 Context-Aware Tool Loading

4.9 Context Efficiency Metrics

4.10 Embedding & Chunking Pipeline

4.11 Prompt Cache Exploitation (Token Optimization Multiplier)

4.12 Persistent Memory Versioning & Retention

4.13 Immutable Context Enforcement Integration

5. Non-Functional Requirements

6. Scope

In Scope (MVP)

In Scope (V1.1)

Out of Scope (V2+)

Explicitly Out of Scope (Not This PRD)

7. Dependencies

8. Architecture Context

8.1 Context Engine API Surface (Interface Contract)

9. Edge Cases & Failure Modes

10. Open Questions

11. Pricing Tier Mapping

12. MVP vs V1.1 Summary

MVP (Build First -- Agents Use Immediately)

V1.1 (Enhanced Optimization)

13. Dogfooding Validation Plan

14. Appendix: Parent PRD Item Cross-Reference

15. Upstream PRD Gaps (Flagged for Parent PRD v1.7)