Prior art and grounding
The mechanism is converging. Progressive autonomy — discrete levels, evidence-gated transitions, safety boundaries — appears, under near-identical descriptions, across vendor frameworks, peer-reviewed research, and a decades-old engineering standard.
What the public material leaves under-specified is the part PAA actually contributes: a vendor-neutral, domain-general, implementable specification in which the evaluator is the first-class object. The buildable, evaluator-centric layer is the open ground.
PAA does not claim to have invented progressive autonomy, and this page is deliberate about saying so. It maps the relevant work, states exactly where PAA fits each piece, and translates every citation into a PAA construct. A citation that cannot be translated does not belong here.
The two-sided gap
The reason this layer is still open is that the two groups working on progressive autonomy each leave out the half the other supplies.
Commercial frameworks name the levels and leave the implementation contract implicit. Maturity models, the audit/assist/automate ladders, trust-benchmark phases, and draft-to-monitored progressions describe the destination but rarely specify the typed boundary, evaluator, oracle, promotion rule, demotion rule, evidence log, or fallback implementation.
Academic work specifies pieces of the mechanism rigorously, but demonstrates them inside specific domains. Osprey is real and deployed, but primarily demonstrates gated execution in facility control. Safe-SDL is precise about safety boundaries and regression logic, but is domain-bound to self-driving laboratories. Malik proposes an agentic architecture for hyperscale network operations (arXiv:2606.09122, June 2026) and independently ports the same progressive autonomy pattern with safety boundaries. Together the three show the mechanism is converging across independent domains; none ships the domain-general evaluator contract PAA defines.
PAA sits in the empty middle: neutral and buildable at once. It takes the transition logic appearing in research, strips it of any single domain, and makes the evaluator, not the level, the thing you specify, instrument, validate, promote, and demote.
The sources, and where PAA fits each
SAE J3016: the inherited ladder
The 0-5 levels of driving automation are the structural template many agent-autonomy frameworks echo: discrete levels, increasing machine authority, decreasing human involvement, each level defined by who is responsible for what. It is not about AI agents, but it is the ancestor.
Where PAA fits. PAA inherits the levels idea from this lineage, as many autonomy frameworks do. PAA's spectrum (assisted -> human-in-the-loop -> human-on-the-loop -> autonomous) is one more port of the SAE frame, and PAA says so rather than re-deriving it. The contribution is not the ladder.
This borrowing is not a weakness in the position; it is corroborated by the research below, which ports the same SAE levels into its own domain. Inheriting an established frame and being honest about it is what lets the genuinely novel part stand out.
| SAE J3016 concept | PAA construct |
|---|---|
| level-based autonomy taxonomy | the autonomy spectrum (inherited frame) |
| responsibility allocation per level | the human-role column across regions |
| operational design domain | scoped autonomy state |
Osprey: the gated boundary, in production
Hellert, Montenegro & Sulc. "Osprey: Production-Ready Agentic AI for Safety-Critical Control Systems." arXiv:2508.15066; published in APL Machine Learning 4, 016103 (2026). Deployed as the ALS Accelerator Assistant at the Advanced Light Source synchrotron, Lawrence Berkeley National Laboratory.
Osprey is a production-ready agentic framework for safety-critical facilities. Its core safety mechanism is plan-first orchestration: the agent generates a complete, human-readable execution plan with explicit dependencies, and that plan goes for human review before any hardware is touched. Actions outside policy are rejected and require explicit operator approval. It runs against real control infrastructure with hundreds of thousands of addressable process variables, a domain where a mistake breaks hardware.
Where PAA fits. Osprey is a concrete instance of PAA's gated end of the spectrum. "Generate a plan, gate on human review before execution" is a blocking gate at HITL, with the plan as the typed artifact at the boundary. Osprey provides strong evidence that the boundary-and-gate pattern can work in production safety-critical infrastructure.
What PAA adds. Osprey gates but does not progress. It has the blocking-review mechanism; it does not formalize how a task earns its way from mandatory review to monitored autonomy, nor how the evaluator itself matures. PAA's promotion/demotion loop and evaluator maturity curve are the layer Osprey does not claim. The relationship is: Osprey is a strong production instance of the gated boundary; PAA generalizes that boundary into a promotable and demotable evaluator-governed progression.
| Osprey concept | PAA construct |
|---|---|
| plan-first orchestration | blocking gate, pre-execution |
| execution plan with dependencies | the typed artifact at the boundary |
| human review before hardware action | HITL position of the evaluator |
| out-of-policy action rejected for approval | deterministic invariant + escalation |
| reproducible artifacts / audit trail | the evidence log |
Safe-SDL: evidence-gated transitions, in research terms
Zhang, Que, Chang, Zhang, Wei & Zhu. "Safe-SDL: Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories." arXiv:2602.15061 (February 2026).
This is the closest direct research analogue for PAA's promotion/demotion mechanism. Safe-SDL frames autonomy transitions around demonstrated competence rather than arbitrary timelines: a human-on-the-loop system advances to bounded autonomy only after accumulating enough incident-free operations to statistically validate its safety, and regresses automatically on an incident or constraint violation, with advancement resuming only after root-cause analysis and corrective action. The paper puts the operating window at "typically hundreds of successful operations."
That is the same transition shape PAA generalizes: evidence-gated advancement, automatic regression, and governed re-promotion. Safe-SDL states that shape for self-driving laboratories; PAA turns it into a domain-general evaluator contract. Notably, Safe-SDL also ports its own levels from SAE J3016 and analyzes Osprey as an instantiating system, so the two keystones and the ancestor are already linked in the literature.
Where PAA fits. Statistical validation over a window maps to the promotion rule. Automatic regression on incident maps to the demotion trigger. Root-cause-before- resumption maps to a governed re-promotion path. This is the construct-level translation, never "Safe-SDL supports PAA":
| Safe-SDL concept | PAA construct |
|---|---|
| demonstrated competence | promotion evidence |
| hundreds of incident-free operations | evaluation window |
| statistical validation of safety claims | promotion threshold |
| automatic regression after incident | demotion trigger |
| root-cause analysis before resumption | governed re-promotion path |
| constrained autonomy domain | scoped autonomy state |
What PAA adds. Safe-SDL is domain-bound to self-driving laboratories and treats evaluation as safety-incident counting. PAA generalizes the same transition logic to any task and any automation substrate, and replaces "count incidents" with the full evaluator model (target, technique, oracle, position) plus eval-the-eval validation. PAA is the domain-general, evaluator-centric form of what Safe-SDL specifies for one vertical.
Malik: agentic architecture for hyperscale network operations
Malik. "Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations." arXiv:2606.09122 (June 2026).
Malik is an architecture report, not a mechanism formalization. Where PAA fits. Progressive autonomy with safety boundaries maps to the spectrum plus the gate; closed-loop verification maps to outcome-layer evaluation feeding promotion and demotion. Malik corroborates the convergence pattern across a third independent domain — hyperscale network operations — without formalizing the evaluator contract PAA defines.
Supporting context (corroboration, not load-bearing)
The broader self-driving-laboratory and human-oversight literature establishes that critical domains require checkpointing, explicit approval points, and safety boundaries, and that autonomy must integrate human oversight by design. This is background that confirms the boundary-and-oversight principle is established. It corroborates; it is not PAA's source, and PAA does not lean on it.
The synthesis PAA is claiming
The mechanism is converging: Osprey demonstrates plan-first gated execution in production safety-critical infrastructure; Safe-SDL formalizes safety boundaries and regression logic for self-driving laboratories; Malik independently ports the same pattern to hyperscale network operations. What remains under-specified across all three is a vendor-neutral, domain-general, implementable specification in which the evaluator is the first-class object. PAA is proposed as that specification.
The defensible original contributions, the parts found in none of the sources above:
- The evaluator as the governed primitive. Each source treats the autonomy level as the unit and evaluation as vague ("demonstrated performance," "trust benchmarks," "incident count"). PAA makes the evaluator the thing you design, validate, and mature.
- The four-choice evaluator model of target, technique, oracle, and position. No source decomposes evaluation this way.
- Eval-the-eval. The evaluator is itself a task on the spectrum, validated against human gold and subject to its own demotion. Nobody else closes this recursion.
- The evaluator maturity curve as economic engine. Bootstrap with expensive flexible evaluators, distill to cheap classifiers as labels accumulate, so evaluator cost and task autonomy co-evolve. This is the answer to "isn't eval-gating everything too expensive," and it appears nowhere else.
- Domain-generality and substrate-agnosticism. Safe-SDL is chemistry; Osprey is facility control; Malik is network operations. PAA is the form that applies to any task and any automation type (LLM, smaller model, or deterministic code) behind a stable typed boundary.
A note on honesty
This position carries a higher credibility bar than a vendor blog, and the bar is the moat: most people cannot clear it, so clearing it is the differentiation.
PAA does not claim to have invented progressive autonomy; that is instantly falsifiable. The research cited here predates PAA and cannot endorse it; the framing is "PAA implements and generalizes the mechanisms these validate," never "these papers are about PAA." Each citation does argumentative work, evidence that a PAA mechanism is sound, rather than sitting as a credential. The lineage is inherited, the transition logic is converging in the literature, and the evaluator-centric implementation layer is what PAA brings that is not yet anywhere else.
Source note
This page intentionally treats the cited work as prior art, not endorsement. Osprey and Safe-SDL predate PAA and are not about PAA. They are cited because specific mechanisms in those works map cleanly to PAA constructs: Osprey to blocking gated execution at a typed boundary, and Safe-SDL to evidence-gated advancement and regression in a safety-critical domain. Malik is cited as corroboration of the same convergence pattern across a third domain; it is an architecture report, not a mechanism formalization. SAE J3016 is cited as inherited levels lineage, not as agent-specific prior art.