Executable contract
Spec
The contract stays tight: each task declaration names the boundary, the evidence, the evaluator, the oracle, the gate position, the promotion rule, the demotion rule, and the fallback implementation.
A task is PAA-compliant when it declares all eight fields. Four define the boundary and evidence — typed boundary, evidence log, evaluator, oracle. Four define the governance — gate position, promotion rule, demotion rule, fallback implementation. Together they form a complete, auditable autonomy contract.
Stability note. This is the PAA reference contract, not a product API. The field names and semantics are stable; tooling and schema formats may evolve.
Task contract
Eight-field task declaration
This is the route contract: keep the autonomy movement explicit and copyable.
- Typed boundary
- Typed input plus typed output.
- Evidence log
- Prompt, inputs, proposal, verdict, outcome signal.
- Evaluator
- Separate evaluator deciding whether the proposal can advance.
- Oracle
- Ground truth used to measure whether the evaluator is correct.
- Gate position
- Gate sits before the action boundary and blocks unsafe advancement.
- Promotion rule
- Promote only when evidence shows stable agreement and low incident cost.
- Demotion rule
- Demote when drift, failure, or outcome regressions appear.
- Fallback implementation
- Human review and manual action.
Compact example
refund_approval
Short enough to scan, detailed enough to reuse when the gate is narrow and the fallback is still manual.
Task contract
refund_approval
Short companion example for fast review and reuse.
- Typed boundary
- Typed refund request in, typed approval or escalation out.
- Evidence log
- Request, purchase history, policy match, decision, reviewer notes.
- Evaluator
- Policy gate plus escalation classifier.
- Oracle
- Approved policy outcomes and historical chargeback labels.
- Gate position
- Before the refund is issued.
- Promotion rule
- Promote easy, low-risk approvals with stable policy matches.
- Demotion rule
- Demote when the request is ambiguous, high-value, or policy-unclear.
- Fallback implementation
- Human review and manual decision.
task: refund_approval
typed_boundary: Typed refund request in, typed approval or escalation out.
evidence_log: Request, purchase history, policy match, decision, reviewer notes.
evaluator: Policy gate plus escalation classifier.
oracle: Approved policy outcomes and historical chargeback labels.
gate_position: Before the refund is issued.
promotion_rule: Promote easy, low-risk approvals with stable policy matches.
demotion_rule: Demote when the request is ambiguous, high-value, or policy-unclear.
fallback_implementation: Human review and manual decision.Full example
pr_review YAML
The fuller review contract carries the same field shape in a form that can be copied into a build, a spec, or an implementation note.
Task contract
pr_review
Full contract example for the review flow.
- Typed boundary
- Typed pull request diff in, typed review verdict or follow-up out.
- Evidence log
- Diff, context, findings, comments, verdict, follow-up state, outcome.
- Evaluator
- Stacked deterministic checks plus reviewer model.
- Oracle
- Merged PR outcomes, regression findings, and reviewer confirmations.
- Gate position
- Before merge and before the merge queue advances.
- Promotion rule
- Promote when evidence shows repeatable correctness and low false-negative risk.
- Demotion rule
- Demote when review misses issues, drifts from policy, or creates regressions.
- Fallback implementation
- Human review with explicit request-for-changes handling.
task: pr_review
typed_boundary: Typed pull request diff in, typed review verdict or follow-up out.
evidence_log: Diff, context, findings, comments, verdict, follow-up state, outcome.
evaluator: Stacked deterministic checks plus reviewer model.
oracle: Merged PR outcomes, regression findings, and reviewer confirmations.
gate_position: Before merge and before the merge queue advances.
promotion_rule: Promote when evidence shows repeatable correctness and low false-negative risk.
demotion_rule: Demote when review misses issues, drifts from policy, or creates regressions.
fallback_implementation: Human review with explicit request-for-changes handling.