Canonical flow
PR review agent
Stacked semantic review for merge safety.
Abstract pattern
Stacked semantic review
The engineer-legible flagship showing how deterministic checks can sit underneath semantic judgment and keep review on the right side of the merge gate.
- Assisted Human-led with automation support.
- HITL Human approves each action.
- HOTL Human samples or monitors.
- Autonomous Automation acts within guardrails.
Task contract
PR review agent
The engineer-legible flagship showing how deterministic checks can sit underneath semantic judgment and keep review on the right side of the merge gate.
- Worker
- LLM + tools
- Boundary
- Input is the PR diff plus context like the linked issue and test results. Output is a structured review verdict of approve, request-changes, or escalate with rationale.
- Evidence log
- Diff, test results, lint results, judge verdict, human decision when gated, and whether a later merge needed a revert or caused an incident.
- Evaluator
- A stacked gate that combines deterministic checks with an LLM judge.
- Promotion rule
- Judge agreement with senior reviewers stays above threshold over a window and deterministic escapes stay near zero, allowing the flow to shift from blocking review to sampled monitoring.
- Demotion rule
- A merged auto-approved PR causes a revert or incident, or judge-vs-human agreement drifts down, which sends the flow back to blocking review.
- Fallback
- Human review stays on the merge path for substantive changes.
- Lives
- HITL -> HOTL
Evaluator detail
What the gate actually checks
- Target
- Output
- Technique
- Deterministic tests and lint run first, then an LLM judge handles the semantic call the deterministic layer cannot make.
- Oracle
- Tests as reference for the deterministic layer and senior reviewer decisions as human-gold for judge validation.
- Position
- hitl
Teaching point
What this flow proves
Cheap deterministic checks can stack under an expensive semantic judge, and the artifact boundary itself is the thing being reviewed.
Six questions
How this flow governs autonomy
- Without PAA
- Merges rely on whoever happens to review the PR; semantic quality is informal and inconsistent; the merge gate has no systematic bar and no audit trail when something ships that should not have.
- What gets gated
- The merge decision — the agent's verdict is held at the gate until a human reviewer clears it; at HOTL only flagged or high-risk PRs re-enter blocking review.
- What is logged
- Diff, test results, lint results, judge verdict, human decision when gated, and whether the merged PR later triggered a revert or incident.
- Earns promotion
- Judge agreement with senior reviewers stays above threshold over a window and deterministic escapes stay near zero.
- Triggers demotion
- A merged auto-approved PR causes a revert or incident, or judge-vs-human agreement drifts below threshold.
- Never full-auto
- Substantive semantic judgment — an LLM judge can own the first-pass review but not the final merge gate without ongoing human validation.
This page is linked from the canonical card set on the flows index.