Executable contract

Spec

The contract stays tight: each task declaration names the boundary, the evidence, the evaluator, the oracle, the gate position, the promotion rule, the demotion rule, and the fallback implementation.

A task is PAA-compliant when it declares all eight fields. Four define the boundary and evidence — typed boundary, evidence log, evaluator, oracle. Four define the governance — gate position, promotion rule, demotion rule, fallback implementation. Together they form a complete, auditable autonomy contract.

Stability note. This is the PAA reference contract, not a product API. The field names and semantics are stable; tooling and schema formats may evolve.

Task contract

Eight-field task declaration

This is the route contract: keep the autonomy movement explicit and copyable.

Typed boundary
Typed input plus typed output.
Evidence log
Prompt, inputs, proposal, verdict, outcome signal.
Evaluator
Separate evaluator deciding whether the proposal can advance.
Oracle
Ground truth used to measure whether the evaluator is correct.
Gate position
Gate sits before the action boundary and blocks unsafe advancement.
Promotion rule
Promote only when evidence shows stable agreement and low incident cost.
Demotion rule
Demote when drift, failure, or outcome regressions appear.
Fallback implementation
Human review and manual action.

Compact example

refund_approval

Short enough to scan, detailed enough to reuse when the gate is narrow and the fallback is still manual.

Task contract

refund_approval

Short companion example for fast review and reuse.

Typed boundary
Typed refund request in, typed approval or escalation out.
Evidence log
Request, purchase history, policy match, decision, reviewer notes.
Evaluator
Policy gate plus escalation classifier.
Oracle
Approved policy outcomes and historical chargeback labels.
Gate position
Before the refund is issued.
Promotion rule
Promote easy, low-risk approvals with stable policy matches.
Demotion rule
Demote when the request is ambiguous, high-value, or policy-unclear.
Fallback implementation
Human review and manual decision.
refund_approval YAML example
task: refund_approval
typed_boundary: Typed refund request in, typed approval or escalation out.
evidence_log: Request, purchase history, policy match, decision, reviewer notes.
evaluator: Policy gate plus escalation classifier.
oracle: Approved policy outcomes and historical chargeback labels.
gate_position: Before the refund is issued.
promotion_rule: Promote easy, low-risk approvals with stable policy matches.
demotion_rule: Demote when the request is ambiguous, high-value, or policy-unclear.
fallback_implementation: Human review and manual decision.

Full example

pr_review YAML

The fuller review contract carries the same field shape in a form that can be copied into a build, a spec, or an implementation note.

Task contract

pr_review

Full contract example for the review flow.

Typed boundary
Typed pull request diff in, typed review verdict or follow-up out.
Evidence log
Diff, context, findings, comments, verdict, follow-up state, outcome.
Evaluator
Stacked deterministic checks plus reviewer model.
Oracle
Merged PR outcomes, regression findings, and reviewer confirmations.
Gate position
Before merge and before the merge queue advances.
Promotion rule
Promote when evidence shows repeatable correctness and low false-negative risk.
Demotion rule
Demote when review misses issues, drifts from policy, or creates regressions.
Fallback implementation
Human review with explicit request-for-changes handling.
pr_review YAML example
task: pr_review
typed_boundary: Typed pull request diff in, typed review verdict or follow-up out.
evidence_log: Diff, context, findings, comments, verdict, follow-up state, outcome.
evaluator: Stacked deterministic checks plus reviewer model.
oracle: Merged PR outcomes, regression findings, and reviewer confirmations.
gate_position: Before merge and before the merge queue advances.
promotion_rule: Promote when evidence shows repeatable correctness and low false-negative risk.
demotion_rule: Demote when review misses issues, drifts from policy, or creates regressions.
fallback_implementation: Human review with explicit request-for-changes handling.