A maturity model for AI-enabled engineering organizations
AI adoption is not a single switch you flip. It is a progression—from AI assisting individual keystrokes, to AI executing chunks of work, to agents completing whole tasks, to autonomous systems running continuously under human governance. The same progression plays out across every phase of the software development lifecycle, from planning through maintenance.
This maturity model maps that progression. Each row is a level of AI maturity. Each column is a phase of the SDLC, owned by a different persona. Read down a column to see how a given phase evolves; read across a row to see what a given level of maturity looks like end to end. Most organizations are at different levels in different phases—that is normal, and the model helps you see exactly where to invest next.
Each offering is designed to lift your organization from one level to the next.
Jumpstart
Levels 0/1 → 2
Acceleration
Level 2 → 3
Advisory & Directed Efforts
Level 3 → 4
|
Plan
Product / PMs
|
Design
Architects / Sr Eng
|
Build
Developers
|
Test
QA / SDETs
|
Review
Dev Leads / Sr Eng
|
Deploy
DevOps / Platform
|
Maintain
Dev / Support Eng
|
|
|---|---|---|---|---|---|---|---|
|
Level 0
Manual
Human does everything
|
Stories and requirements written by hand. Tribal knowledge, meetings, and emails. | Whiteboard sessions. Design docs and ADRs written manually, if they exist at all. | All code written by hand. No AI tools. Speed bound by individual developer. | Unit and automation tests written by hand. QA bottlenecks delivery with long regression cycles. | Fully human review. Quality varies by reviewer. Inconsistent standards. | Manual deploys. CI/CD limited or fragile. Releases infrequent, high-stress. | Reactive bug fixing. Tech debt unchecked. Legacy knowledge in individuals. |
|
Level 1
AI-assisted
AI assists, human approves each step
|
AI helps draft individual stories and acceptance criteria. PM still drives structure and prioritization. | AI drafts ADRs, suggests patterns, evaluates tradeoffs. Same design process, faster exploration. | Code completion and inline suggestions. Same workflow. AI is a smarter autocomplete. | AI generates individual automated tests and suggests edge cases. Same QA process, less friction writing tests. | AI review tools added. Noisy at first: useful signal mixed with low-value suggestions. Same process, tuning required. | AI helps write individual scripts, configs, and troubleshoot failures. Same deploy process, faster fixes. | AI chat used for individual debugging sessions and log analysis. Same process, faster diagnosis. |
|
Level 2
AI-integrated
AI executes chunks, human reviews the work
|
PM drives with AI co-authoring specs, PRDs, and prioritization analysis. AI drafts, human shapes and decides. | AI generates system design options: service boundaries, data flows, architecture tradeoffs. Architect drives decisions. | Developer drives feature work with AI handling multi-file/component chunks. Skills accelerate common patterns. Still coding, but faster. | AI writes the majority of all tests. Velocity increases, but manual testing and upfront test design work continues. | AI does comprehensive first-pass review on all PRs: bugs, patterns, security, standards. Noise reduced, signal improving. | AI adds intelligence to existing automation: failure triage, config optimization, health monitoring. Releases more frequent. | AI assists broader debugging and production issue triage. Faster root cause identification. Proactive maintenance emerging. |
|
Level 3
AI-native
AI completes full tasks, human reviews outcomes
|
Agents generate full specs from product conversations and user research. PM reviews, iterates, and approves for engineering handoff. | Agents generate architecture proposals (service design, schema changes, infra plans) from collected system data. Architect evaluates and approves. | Agents build complete features from specs. Developer triggers, reviews outcomes, and iterates. Not reviewing individual lines. | Agents design and execute full test strategies: integration plans, regression suites, etc. QA reviews coverage and results. | Agents catch systemic patterns across PRs: architectural drift, dependency risks, etc. Auto-patches minor issues, escalates significant concerns. | Deployment policies execute automatically (canary, rollback, feature flags) based on codified rules and production signals. Human sets policies. | Agents investigate production issues, identify root cause, and execute fixes when triggered. Engineer reviews outcomes. |
|
Level 4
Autonomous
AI works continuously, human governs & reviews exceptions
|
AI identifies requirements from production signals and user behavior continuously. Human governs priorities and strategic direction. | Agents generate and iterate on design proposals autonomously. Architecture evolves continuously. Human sets constraints and guardrails. | Agents pick up and complete work from the backlog continuously. Human defines process, governs guardrails, reviews exceptions. | Tests evolve continuously with the codebase. Coverage self-optimizes from production defect patterns. No trigger needed. | Agents control the merge gate. Simple PRs auto-merged, complex changes flagged. Continuous quality assurance. | Self-healing pipelines detect and recover from failures continuously. Release strategy adapts automatically. Human governs policy. | Self-healing systems detect, diagnose, and resolve issues continuously without triggers. Human defines process and strategic direction. |
Stories and requirements written by hand. Tribal knowledge, meetings, and emails.
Whiteboard sessions. Design docs and ADRs written manually, if they exist at all.
All code written by hand. No AI tools. Speed bound by individual developer.
Unit and automation tests written by hand. QA bottlenecks delivery with long regression cycles.
Fully human review. Quality varies by reviewer. Inconsistent standards.
Manual deploys. CI/CD limited or fragile. Releases infrequent, high-stress.
Reactive bug fixing. Tech debt unchecked. Legacy knowledge in individuals.
AI helps draft individual stories and acceptance criteria. PM still drives structure and prioritization.
AI drafts ADRs, suggests patterns, evaluates tradeoffs. Same design process, faster exploration.
Code completion and inline suggestions. Same workflow. AI is a smarter autocomplete.
AI generates individual automated tests and suggests edge cases. Same QA process, less friction writing tests.
AI review tools added. Noisy at first: useful signal mixed with low-value suggestions. Same process, tuning required.
AI helps write individual scripts, configs, and troubleshoot failures. Same deploy process, faster fixes.
AI chat used for individual debugging sessions and log analysis. Same process, faster diagnosis.
PM drives with AI co-authoring specs, PRDs, and prioritization analysis. AI drafts, human shapes and decides.
AI generates system design options: service boundaries, data flows, architecture tradeoffs. Architect drives decisions.
Developer drives feature work with AI handling multi-file/component chunks. Skills accelerate common patterns. Still coding, but faster.
AI writes the majority of all tests. Velocity increases, but manual testing and upfront test design work continues.
AI does comprehensive first-pass review on all PRs: bugs, patterns, security, standards. Noise reduced, signal improving.
AI adds intelligence to existing automation: failure triage, config optimization, health monitoring. Releases more frequent.
AI assists broader debugging and production issue triage. Faster root cause identification. Proactive maintenance emerging.
Agents generate full specs from product conversations and user research. PM reviews, iterates, and approves for engineering handoff.
Agents generate architecture proposals (service design, schema changes, infra plans) from collected system data. Architect evaluates and approves.
Agents build complete features from specs. Developer triggers, reviews outcomes, and iterates. Not reviewing individual lines.
Agents design and execute full test strategies: integration plans, regression suites, etc. QA reviews coverage and results.
Agents catch systemic patterns across PRs: architectural drift, dependency risks, etc. Auto-patches minor issues, escalates significant concerns.
Deployment policies execute automatically (canary, rollback, feature flags) based on codified rules and production signals. Human sets policies.
Agents investigate production issues, identify root cause, and execute fixes when triggered. Engineer reviews outcomes.
AI identifies requirements from production signals and user behavior continuously. Human governs priorities and strategic direction.
Agents generate and iterate on design proposals autonomously. Architecture evolves continuously. Human sets constraints and guardrails.
Agents pick up and complete work from the backlog continuously. Human defines process, governs guardrails, reviews exceptions.
Tests evolve continuously with the codebase. Coverage self-optimizes from production defect patterns. No trigger needed.
Agents control the merge gate. Simple PRs auto-merged, complex changes flagged. Continuous quality assurance.
Self-healing pipelines detect and recover from failures continuously. Release strategy adapts automatically. Human governs policy.
Self-healing systems detect, diagnose, and resolve issues continuously without triggers. Human defines process and strategic direction.
Maturity is earned, not declared. Use these thresholds to confirm a level is genuinely in place before claiming the next one. Read the full metrics framework →
Level 1 — AI-assisted
License coverage 100%; WAUs 90%+
Level 2 — AI-integrated
DAUs 90%+; repos with custom context 90%+; repos with custom skills 90%+
Level 3 — AI-native
% of engineers invoking skills daily; # of repos with complex skills; # of shared plugins deployed; % of tokens spent via skills
Level 4 — Autonomous
# of autonomous workflows; # of tickets closed autonomously
The organizations getting real leverage from AI are not the ones that bought the most licenses—they are the ones that moved deliberately up the model, phase by phase, with metrics confirming each step. The goal is not to reach Level 4 everywhere overnight. It is to know exactly where you stand today and what the next investment should be.
Want to find out where your organization sits on the AI-SDLC maturity model? Schedule a discovery call and we will map your current state and the fastest path up.
Continue your AI development journey with these related resources
Proven strategies to maximize developer productivity with AI assistance.
Read ResourceChoose the right AI adoption strategy for your team and organization.
Read ResourceA small, opinionated metric stack for measuring AI coding tool adoption and impact in engineering orgs.
Read Resource