Acknowledgments & Attribution#
This methodology synthesizes foundational work from the AI development community with lessons learned from production deployments of autonomous AI systems.
Foundational Work#
Raja SP, Amazon Web Services — AI-Driven Development Lifecycle (AI-DLC) Method Definition (July 2025). The core concepts of Intent, Unit, Bolt, Mob Elaboration, and the philosophy of reimagining development methods rather than retrofitting AI into existing processes originate from this foundational work.
Key Influences for 2026#
Geoffrey Huntley — Creator of the Ralph Wiggum Software Development Technique. The philosophy of "deterministically bad in an undeterministic world" and autonomous loop patterns are central to AI-DLC 2026.
Boris Cherny & Anthropic — Ralph Wiggum plugin for Claude Code, demonstrating production viability of autonomous development loops.
Steve Wilson (OWASP) — Human-on-the-Loop governance frameworks and the articulation of HITL vs HOTL operating modes.
paddo.dev — Analysis of SDLC collapse, the "19-agent trap," and the insight that phase gates become friction rather than quality control in AI-driven workflows.
HumanLayer — 12 Factor Agents principles and context engineering research including the "dumb zone" phenomenon.
Preface: The State of AI-Driven Development#
Software development has undergone a fundamental transformation. What began as AI assistance for fine-grained tasks—code completion, bug detection, test generation—has evolved into AI autonomy for sustained, multi-hour reasoning and implementation. This shift demands new methodologies built from first principles rather than adaptations of existing processes.
The landscape of AI-driven development in 2026 differs dramatically from just two years prior:
| Dimension | 2024 | Jul '25 | Jan '26 |
|---|---|---|---|
| Model Capability | Code completion and simple tasks | Handle complete features with guidance | Sustain multi-hour autonomous reasoning |
| Human Role | Direct every step | Validate major decisions | Define success criteria; AI iterates to convergence |
| Iteration Cycles | Measured in days | Measured in hours | Measured in minutes |
| Context Windows | Limited (8K-32K tokens) | Expanding (100K-200K tokens) | Abundant but quality-sensitive (200K-1M tokens) |
| Extensibility | None | None | Skills, sub-agents, hooks, MCP servers |
| Multi-Agent Orchestration | Experimental research | Promising theoretical approach | Simple loops outperform complex swarms |
| AI Reliability | Requires constant supervision | Can complete bounded tasks | 40,000+ lines written by AI using AI monthly |
| Memory Systems | Basic conversation history | Conversation-based, ephemeral | File-based persistence, MCP-enabled organizational memory |
AI-DLC 2026 preserves foundational insights while incorporating lessons from production deployments: the Ralph Wiggum autonomous loop methodology, the emerging consensus that traditional SDLC phases are collapsing, the practical realization that simpler AI workflows outperform elaborate orchestrations, and the understanding that organizational knowledge—previously locked in tickets, documents, and runbooks—can now serve as memory for AI agents.
Context#
The Evolution of Software Engineering#
The evolution of software engineering has been a continuous quest to enable developers to focus on solving complex problems by abstracting away lower-level, undifferentiated tasks. From early machine code to high-level programming languages, from the adoption of APIs and libraries to cloud services, each step has significantly boosted developer productivity by moving humans further from implementation details and closer to problem expression.
The integration of Large Language Models marked a revolutionary shift, introducing conversational natural language interactions for tasks like code generation, bug detection, and test creation. This was the AI-Assisted era—AI enhancing fine-grained, specific tasks while humans retained full control of workflow and decisions.
We have now entered the AI-Autonomous era. The emergence of state-persistent orchestration and task-handoff techniques allows AI to maintain context and intent across multi-hour, multi-session workflows. Combined with tools for autonomous development loops, these techniques enable workflows where humans define destinations and guardrails, then step back while AI iterates toward success. Independent evaluations estimate that frontier models can now complete tasks that take humans four to five hours. Anthropic's Claude Code lead reported writing 40,000 lines of production code using Claude Code itself in a single month.
The Problem with Traditional Methods#
Existing software development methods—Waterfall, Agile, Scrum—were designed for human-driven processes with long iteration cycles. Their reliance on manual workflows and rigid role definitions limits the ability to fully leverage AI capabilities. Retrofitting AI into these methods not only constrains its potential but reinforces outdated inefficiencies.
Traditional phase boundaries—requirements → design → implementation → testing → deployment—existed because iteration was expensive. When changing requirements meant weeks of rework, sequential phases with approval gates made economic sense. Each phase required:
- Handoffs between specialized roles
- Documentation to transfer context
- Approval gates to validate progress
- Wait times for reviews and decisions
With AI, iteration costs approach zero. You try something, it fails, you adjust, you try again—all in seconds, not weeks. The phases aren't just being augmented; they're collapsing into continuous flow.
"Phase gates that once provided quality control now create friction."
The New Reality#
To fully leverage AI's transformative power, development methods need reimagination. This reimagination requires AI to be a central collaborator, with workflows, roles, and iterations aligned to enable faster decision-making, seamless task execution, and continuous adaptability.
This paper introduces AI-DLC 2026, a methodology that embraces both supervised and autonomous modes of AI collaboration, preserves human judgment where it matters most, and sets foundations for the next evolution in software engineering.
Why AI-DLC Over Ad-Hoc AI Assistance#
Most teams today use AI for coding in an ad-hoc manner:
- Copy-paste code from chat interfaces
- Use autocomplete suggestions without verification
- No structured iteration toward defined goals
- Human still manages all workflow manually
- Context lost between sessions
Problems with ad-hoc approaches:
| Issue | Impact |
|---|---|
| No backpressure | AI errors propagate unchecked into production |
| No completion criteria | "Good enough" without verification |
| No mode selection | Using autonomous approaches for HITL-appropriate work (or vice versa) |
| Context lost between sessions | Rework and drift from original intent |
| No metrics | Can't tell if AI is helping or hurting |
AI-DLC addresses these systematically:
| Ad-Hoc AI | AI-DLC |
|---|---|
| Chat-and-paste | Structured iteration loops |
| Hope it works | Backpressure verification |
| Vague "make it work" | Explicit completion criteria |
| One mode fits all | Mode selection (HITL/OHOTL/AHOTL) |
| Session-bound context | File-based persistent memory |
| No measurement | Metrics framework |
| AI as tool | AI as collaborator with governance |
The Compounding Effect
AI-DLC's value compounds over time:
- Completion criteria get better as teams learn to write them
- Bolt success rate improves as patterns are established
- Mode selection becomes intuitive with practice
- Quality gates accumulate creating robust backpressure
- Memory providers expand giving AI richer context
Ad-hoc approaches don't compound—each session starts fresh, each prompt is one-off, no organizational learning occurs. The team with AI-DLC gets better at working with AI; the team with ad-hoc AI starts over every time.
Beyond Code: The Requirements Silo
The same ad-hoc pattern exists upstream. Teams use AI to polish requirements documents—reformatting PRDs, summarizing meetings, making user stories "readable." This treats AI as a formatting tool rather than a thinking partner.
The opportunity is different: AI should steer the requirements conversation, asking probing questions to extract context humans didn't know they possessed. "What happens if the user cancels mid-checkout?" "How should this behave for EU users under GDPR?" These questions surface assumptions that would otherwise remain implicit until implementation—when they become expensive to address.
AI-DLC addresses this through Mob Elaboration: AI interrogates stakeholders until requirements are complete enough to express as verifiable completion criteria.
Core Principles#
The following principles form the foundation of AI-DLC 2026. They shape its phases, roles, artifacts, and rituals. These principles are critical for validating the methodology, as they provide the underpinning rationale behind its design.
Reimagine Rather Than Retrofit#
We choose to reimagine a development method rather than keeping existing methods like Waterfall or Agile and retrofitting AI into them. Traditional methods were built for longer iteration durations—weeks and months—which led to rituals like daily standups, sprint planning, and retrospectives. These rituals assume a cadence that AI has rendered obsolete.
Proper application of AI leads to rapid cycles measured in hours or even minutes. This demands continuous, real-time validation and feedback mechanisms, rendering many traditional rituals less relevant:
Would effort estimation (story points) be as critical if AI diminishes the boundaries between simple, medium, and hard tasks? When sub-hour architectural iterations become routine, human estimation becomes unreliable.
Would velocity metrics be relevant, or should we replace them with business value delivered? When the constraint shifts from implementation speed to requirement clarity, traditional productivity metrics miss the point.
Would sequential phases help when try-fail-adjust cycles take seconds? When iteration is nearly free, upfront design becomes a tax rather than an investment.
These new dynamics warrant reimagination based on first principles thinking rather than retrofitting. We need automobiles, not faster horse chariots.
Three Operating Modes: HITL, OHOTL, and AHOTL#
AI-DLC 2026 introduces the concept of AI initiating and directing conversations, using the Google Maps analogy: humans set the destination, AI provides step-by-step directions, humans maintain oversight. This methodology distinguishes three distinct operating modes that form a spectrum of human involvement:
Human-in-the-Loop (HITL): Human judgment is directly involved in decision-making. AI proposes options, human validates, AI executes. The human approves each significant step before proceeding.
This mode is essential for:
- Novel domains or first-time implementations where patterns aren't established
- Architectural decisions with long-term consequences that are difficult to reverse
- High-risk operations involving production data, security-sensitive changes, or compliance requirements
- Foundational decisions that will shape subsequent autonomous work
OHOTL (Observed Human-on-the-Loop): The system operates while a human watches in real-time, able to intervene at any moment but not required to approve each step. The human maintains synchronous awareness with asynchronous control—they see what's happening and can redirect, but progress isn't blocked waiting for approval.
This mode is appropriate for:
- Creative and subjective work including UX decisions, design, copy, and content
- Training and onboarding scenarios where observation has educational value
- Medium-risk changes that benefit from human awareness without requiring blocking approval
- Judgment-heavy work requiring real-time human input without formal gates
- Iterative refinement where human taste guides direction
AHOTL (Autonomous Human-on-the-Loop): The system operates autonomously while humans receive periodic updates and intervene only when needed. AI executes within defined boundaries until success criteria are met, alerting humans only when intervention is required.
This mode is appropriate for:
- Well-defined tasks with clear, measurable acceptance criteria
- Programmatically verifiable work where tests, type checks, and linting can validate correctness
- Batch operations including migrations, refactors, and test coverage expansion
- Background work that can run overnight or during off-hours
- Mechanical transformations following established patterns
Comparing the Three Modes:
| Aspect | HITL | OHOTL | AHOTL |
|---|---|---|---|
| Human Attention | Continuous, blocking | Continuous, non-blocking | Periodic, on-demand |
| Approval Model | Before each step | Any time (interrupt) | At completion |
| AI Autonomy | Minimal | Moderate | Full within boundaries |
| Feedback Loop | Synchronous | Real-time, optional | Asynchronous |
| Best For | Novel, high-risk | Creative, subjective | Mechanical, verifiable |
The Google Maps analogy extends across all three modes:
- HITL: You tell the GPS each turn to make, it confirms, you approve, it executes
- OHOTL: GPS drives while you watch the route, you can say "no, not that way" at any moment
- AHOTL: You set the destination, define acceptable routes (no toll roads, avoid highways), and check when you arrive
The key insight: The human doesn't disappear. The human's function changes—from micromanaging execution to defining outcomes, observing progress, and building quality gates. The choice of mode depends on the nature of the work, not just its risk level.
Backpressure Over Prescription#
Traditional methodologies prescribe how work should be done. Detailed process steps, code review checklists, and implementation patterns create rigid workflows that constrain AI's ability to leverage its full capabilities.
AI-DLC 2026 introduces a different approach: backpressure—quality gates that reject non-conforming work without dictating approach.
"Don't prescribe how; create gates that reject bad work." — Geoffrey Huntley
Instead of specifying "first write the interface, then implement the class, then write unit tests, then integration tests," define the constraints that must be satisfied:
- All tests must pass
- Type checks must succeed
- Linting must be clean
- Security scans must clear
- Coverage must exceed threshold
- Performance benchmarks must be met
Let AI determine how to satisfy these constraints. This approach offers multiple benefits:
- Leverages AI capabilities fully — AI can apply its training and reasoning without artificial constraints
- Reduces prompt complexity — Success criteria are simpler to specify than step-by-step instructions
- Makes success measurable — Programmatic verification enables autonomous operation
- Enables iteration — Each failure provides signal; each attempt refines the approach
Prescriptive workflows create an intelligence ceiling—like consulting a board of advisors but only permitting yes-or-no answers. Frontier models encode patterns from millions of engineering decisions. Backpressure unlocks this: define the outcome, and AI can propose architectural alternatives, surface constraint conflicts, and apply its training to find paths you hadn't considered.
The philosophy can be summarized as: "Better to fail predictably than succeed unpredictably." Each failure is data. Each iteration refines the approach. The skill shifts from directing AI step-by-step to writing criteria and tests that converge toward correct solutions.
Embrace the Collapsing SDLC#
Traditional SDLC phases existed because iteration was expensive. Each handoff between analyst → architect → developer → tester → operations lost context, added latency, and created opportunities for misalignment. Sequential phases with approval gates were an economic optimization for a world of expensive iteration.
With AI, that economic calculus inverts. Iteration is nearly free. Context loss from handoffs becomes the dominant cost. AI-DLC 2026 models development as continuous flow with strategic checkpoints rather than discrete phases.
Checkpoints differ from handoffs:
| Handoff (Traditional) | Checkpoint (AI-DLC 2026) |
|---|---|
| Work stops completely | Work pauses briefly |
| Context transfers to another party | Same agent continues with feedback |
| New party must rebuild understanding | Context is preserved |
| Documents carry knowledge | Files and git carry knowledge |
| Approval required to proceed | Review identifies needed adjustments |
This doesn't mean structure disappears. It means structure changes form—from sequential gates to parallel loops with human oversight at strategic moments.
Context Is Abundant—Use It Wisely#
Modern language models offer context windows ranging from 200K tokens (Claude Opus 4.5) to over 1 million tokens (Claude Sonnet 4.5, Gemini). This abundance fundamentally changes how we think about AI workflows—but not in the ways that might be obvious.
The 19-Agent Trap: Early enthusiasm for AI led to complex multi-agent scaffolding that mapped AI agents to traditional org charts—an Analyst agent, PM agent, Architect agent, Developer agent, QA agent, and so on. This approach consistently performs worse than simpler alternatives because:
- Each handoff between agents loses context despite large windows
- More scaffolding creates more opportunities for misalignment
- Orchestration overhead consumes attention budget
- Debugging multi-agent failures is exponentially harder
"As agents accumulate tools, they get dumber."
Research shows model performance degrades when context windows exceed 40-60% utilization—the "dumb zone." Information "lost in the middle" of large contexts receives less attention than information at the beginning or end.
The Insight: Abundant context windows don't mean we should fill them with everything. They mean we can be selective about high-quality context rather than compressed summaries. AI-DLC 2026 favors:
- Small, focused agents with relevant context over comprehensive agents with scattered information
- Quality over quantity in context selection
- Strategic context engineering rather than context stuffing
- Simple loops with clear objectives over complex orchestrations
The best workflows aren't complex orchestrations—they're simple loops with clear objectives and rich, relevant context.
Memory Providers Expand Knowledge#
AI context windows reset between sessions. Modified files and git history provide persistence without complex memory infrastructure. AI-DLC 2026 extends this insight by recognizing that existing organizational artifacts are memory providers that AI agents can access.
Traditional SDLC processes created extensive documentation that often went unused:
- Product Requirements Documents (PRDs)
- Architecture Decision Records (ADRs)
- Technical design specifications
- Runbooks and operational procedures
- Tickets and issue histories
- Retrospectives and post-mortems
These artifacts represent institutional memory—decisions made, rationales documented, patterns established. Through modern integration protocols (MCP servers, API connectors, knowledge bases), AI agents can now access this memory directly.
Memory Layers in AI-DLC 2026:
| Layer | Location | Speed | Purpose |
|---|---|---|---|
| Rules | Project rules files | Instant | Conventions, patterns, constraints |
| Session | Working files, scratchpads | Fast | Current task context, progress |
| Project | Git history, codebase | Indexed | What was tried, what worked |
| Organizational | Connected systems | Query | Institutional knowledge, decisions |
| Runtime | Monitoring systems | Query | Production behavior, incidents |
Practical implications:
- An agent implementing a feature can query the ticket system for acceptance criteria
- An agent making architectural decisions can retrieve relevant ADRs for precedent
- An agent debugging can access incident history for similar symptoms
- An agent writing documentation can query existing standards for style
The filesystem remains the simplest, most robust memory provider. Git history shows what was attempted. Modified files persist across iterations. But these aren't the only options—organizational memory, properly connected, dramatically expands what agents can know.
Completion Criteria Enable Autonomy#
The key enabler of autonomous operation is programmatic verification. If success can be measured by machines, AI can iterate toward it without human intervention for each step.
Every work element in AI-DLC 2026 should have explicit Completion Criteria—programmatically verifiable conditions that define success:
Good completion criteria are:
- Specific and measurable — Not "code is good" but "TypeScript compiles with strict mode"
- Programmatically verifiable — Tests pass, linters succeed, benchmarks clear
- Implementation-independent — Define what, not how
- Complete — Cover functional requirements, quality attributes, and constraints
Examples:
| Bad Criteria | Good Criteria |
|---|---|
| "Code is well-tested" | "All tests pass with >80% coverage" |
| "API is performant" | "API responds in <200ms p95 under 1000 req/s load" |
| "Code is type-safe" | "TypeScript compiles with strict mode, no errors" |
| "Feature is complete" | "All acceptance tests in tests/feature/ pass" |
| "Security is addressed" | "Security scan passes with no critical/high findings" |
This transforms the human role from "validator of each step" to "definer of done." The human specifies what success looks like; the AI figures out how to achieve it.
Design Techniques Are Tools, Not Requirements#
AI-DLC 2026 takes a pragmatic approach: design techniques like Domain-Driven Design (DDD), Test-Driven Development (TDD), and Behavior-Driven Development (BDD) are valuable tools to apply when appropriate, not mandatory steps in every workflow.
Use structured design (DDD, TDD, BDD) when:
- Domain complexity benefits from explicit modeling
- Multiple teams need shared understanding of boundaries
- Long-term maintainability is paramount
- Compliance requirements demand documentation
Skip heavyweight upfront design when:
- Task is well-defined and bounded
- Quick iteration will reveal the right structure faster than analysis
- Automated verification can validate correctness
- Design patterns are already established in the codebase
AI can apply design patterns during execution without requiring explicit design phases. When an AI generates code, it draws on its training to apply appropriate patterns—factory methods, dependency injection, repository patterns—without being told to do so.
The test suite, not the architecture document, becomes the source of truth. If the tests pass and the code meets non-functional requirements, the implementation is valid regardless of whether it matches a pre-specified design.
Everyone Becomes a Builder#
In traditional development, "design" and "build" are distinct phases requiring different specialists:
- Designer creates mockups (Figma, wireframes)
- Developer implements the specs
- Handoff is a document or image—and often a source of friction
AI collapses this gap. When AI generates a React component, it simultaneously designs and implements. There's no mockup-to-code handoff. The artifact IS the design.
This doesn't eliminate designers—it empowers everyone to build.
| Traditional | AI-DLC 2026 |
|---|---|
| Designer creates mockup, waits for developer | Designer directs AI, sees result immediately |
| Developer interprets spec, may misunderstand | Developer directs AI with technical precision |
| Handoffs lose context | No handoffs—same person refines until done |
Your discipline is your superpower. Everyone uses the same workflow (builder → reviewer), but discipline provides context that makes you better at directing AI:
- Designer as builder: Visual intuition, UX sensibility, and aesthetic judgment guide the AI toward beautiful, usable interfaces. Designers can now manifest their vision directly—no waiting for developer availability.
- Backend engineer as builder: System design knowledge, performance intuition, and architectural patterns guide the AI toward robust, scalable services.
- Product manager as builder: User empathy, business context, and prioritization skills guide the AI toward features that matter.
- Technical writer as builder: Communication clarity, audience awareness, and information architecture guide the AI toward documentation people actually read.
The skill shifts from execution to refinement. Instead of manually implementing every pixel or line of code, you review what AI produces and guide it toward your vision. Your years of experience in your discipline make you better at recognizing what's good, what's off, and how to improve it.
This is democratization, not elimination. A designer who could previously only create mockups can now ship working code. A backend engineer who avoided frontend can now build full features. Everyone gains capabilities; no one loses their value. Your expertise makes you a better director of AI—not a victim of it.
The Operator Multiplier. AI is a force multiplier, not an equalizer. A junior with AI produces more work; an experienced operator with AI produces superior systems. The difference lies in steering: recognizing architectural debt before it accumulates, catching edge cases the tests don't cover, knowing when the AI's suggestion is a local optimum. High-quality output still requires experienced judgment—either directly or through Mob Construction patterns where seasoned builders guide the overall direction.
Streamline Responsibilities#
AI's ability to perform task decomposition, code generation, testing, documentation, and deployment reduces the need for specialized roles. A single developer supervising AI can accomplish what previously required separate specialists for frontend, backend, infrastructure, testing, and documentation.
This doesn't eliminate human value—it concentrates it on activities where human judgment is essential:
- Strategic direction — Deciding what to build and why
- Trade-off decisions — Balancing competing concerns that require business context
- Validation — Ensuring work aligns with actual needs
- Risk assessment — Identifying and mitigating potential harms
- Quality oversight — Maintaining standards that automated checks can't capture
The role shifts from "doing the work" to "defining what work matters and verifying it's done well."
However, humans remain integral. Product owners ensure alignment with business objectives. Developers maintain design quality and handle judgment calls. These roles ensure that automation and human accountability remain balanced.
Tool and Platform Agnostic#
AI-DLC 2026 is intentionally agnostic to both tooling and infrastructure. The methodology applies regardless of which IDE, AI agent, or cloud provider you choose.
Technique over tools. The core of AI-DLC isn't a specific product—it's the logic of how state persists across sessions, how backpressure guides iteration, and how human steering integrates with autonomous execution. These patterns work whether you're using Claude Code, Cursor, Kiro, OpenCode, or a custom agent framework. The principles transfer; the implementation details vary.
Infrastructure independence. The methodology applies regardless of deployment choices:
- Container orchestration: Kubernetes, ECS, Docker Swarm, Nomad
- Serverless: Lambda, Cloud Functions, Azure Functions, Cloudflare Workers
- Traditional: Virtual machines, bare metal servers
- Edge: IoT devices, edge computing platforms
- Hybrid: Any combination of the above
Choose tools and infrastructure based on requirements, cost constraints, and team expertise—not methodology constraints.
Core Framework#
This section outlines the core framework of AI-DLC 2026, detailing its artifacts, phases, rituals, and workflows.
Artifacts#
Intent#
An Intent is a high-level statement of purpose that encapsulates what needs to be achieved, whether a business goal, a feature, or a technical outcome (e.g., performance scaling, security improvement). It serves as the starting point for AI-driven decomposition into actionable tasks, aligning human objectives with AI-generated plans.
Every Intent includes explicit Completion Criteria—programmatically verifiable conditions that define success and enable autonomous execution.
Example Intent:
## Intent: User Authentication System
### Description
Add secure user authentication to the application supporting
email/password registration and OAuth providers for social login.
### Business Context
- Current system has no user accounts
- Need to support personalization features planned for Q2
- Must integrate with existing customer database
### Completion Criteria
- [ ] Users can register with email/password
- [ ] Users can log in and receive JWT tokens
- [ ] JWT tokens expire appropriately and can be refreshed
- [ ] OAuth flow works for Google and GitHub
- [ ] Password reset flow via email works
- [ ] All auth endpoints have >80% test coverage
- [ ] Security scan passes with no critical findings
- [ ] Load test: 100 concurrent logins complete in <2s p99
Unit#
A Unit represents a cohesive, self-contained work element derived from an Intent, specifically designed to deliver measurable value. Units are analogous to Bounded Contexts in Domain-Driven Design or Epics in Scrum.
Characteristics of well-defined Units:
- Cohesive: User stories within the Unit are highly related
- Loosely coupled: Minimal dependencies on other Units
- Independently deployable: Can go to production without other Units
- Clear boundaries: Ownership and scope are unambiguous
- Explicit dependencies: Upstream Units are declared in frontmatter
- Single discipline: Each Unit focuses on one discipline (frontend, backend, api, documentation)
Single-Discipline Units and Vertical Slices:
When implementing a vertical feature that spans multiple layers, decompose into discipline-specific units:
"User Profile" intent:
├── unit-01-user-profile-api (discipline: api)
├── unit-02-user-profile-backend (discipline: backend)
│ depends_on: [unit-01]
└── unit-03-user-profile-ui (discipline: frontend)
depends_on: [unit-02]
This decomposition provides:
- Focused context: Each unit gets discipline-appropriate context (design system for frontend, API specs for backend)
- Clear ownership: No ambiguity about what each unit delivers
- Parallel potential: Independent units can work simultaneously
- Appropriate tooling: Discipline-specific agents and validation hooks
Each Unit encompasses:
- User stories articulating functional scope
- Non-functional requirements specific to that Unit
- Completion criteria for verification
- Risk descriptions
- Dependencies on other Units (if any)
The process of decomposing Intents into Units is driven by AI, with developers and Product Owners validating and refining the resulting Units to ensure alignment with business and technical objectives.
Unit Dependencies form a DAG:
Units can declare dependencies on other Units using the depends_on frontmatter field. This creates a Directed Acyclic Graph (DAG) that enables:
- Fan-out: Multiple Units with no shared dependencies execute in parallel
- Fan-in: Units that depend on multiple upstream Units wait for all to complete
- Maximum parallelism: Ready Units (dependencies satisfied) can start immediately
unit-01 ─────────────────┐
├──→ unit-04 ──→ unit-05
unit-02 ──→ unit-03 ────┘
In this example:
- unit-01 and unit-02 start in parallel (no dependencies)
- unit-03 waits for unit-02
- unit-04 waits for both unit-01 AND unit-03 (fan-in)
- unit-05 waits for unit-04
Unit file naming convention:
Units use a numerical index prefix followed by a meaningful slug:
.ai-dlc/
└── add-oauth-login/
├── INTENT.md
├── unit-01-setup-provider.md
├── unit-02-callback-handler.md
├── unit-03-session-management.md
└── unit-04-auth-integration.md
Unit frontmatter example:
---
status: pending
depends_on: [unit-01-setup-provider, unit-03-session-management]
branch: ai-dlc/add-oauth-login/04-auth-integration
---
The numerical prefix ensures ordering visibility; the slug provides semantic meaning. Dependencies reference the full unit identifier (number + slug).
A Unit becomes ready when all its dependencies have status completed. Units with no dependencies are ready immediately.
Bolt#
A Bolt is the smallest iteration cycle in AI-DLC 2026, designed for rapid implementation of a Unit or a set of tasks within a Unit. The term "Bolt" (analogous to Sprints in Scrum) emphasizes intense focus and high-velocity delivery, with build-validation cycles measured in hours or days rather than weeks.
Bolts operate in three modes:
Supervised Bolt (HITL)#
Human validates each major step before proceeding. AI proposes, human reviews, AI implements, human validates. Used for high-risk, novel, or foundational work.
Supervised Bolt characteristics:
- Human approval required before each significant step
- AI explains reasoning and presents options
- Human makes trade-off decisions
- Used for foundational work that shapes subsequent autonomous execution
Observed Bolt (OHOTL)#
AI works while human watches in real-time. Human can intervene at any moment but doesn't block progress. Used for creative, subjective, or training scenarios.
Observed Bolt characteristics:
- Real-time visibility into AI's work (streaming output)
- Non-blocking human presence—AI proceeds unless interrupted
- Human can redirect at any moment based on observation
- Combines AI speed with human judgment availability
- Educational value—humans learn patterns while observing
When to use Observed Bolts:
| Scenario | Why Observed Works |
|---|---|
| UI/UX implementation | Subjective quality—human can say "that looks wrong" immediately |
| Content writing | Tone and style judgment available in real-time |
| Junior engineer training | Learning opportunity from watching AI work |
| Exploratory refactoring | Human can redirect if approach seems off |
| Design system work | Aesthetic decisions benefit from immediate feedback |
Autonomous Bolt (AHOTL)#
AI iterates until completion criteria are met, using test results and quality gates as feedback. Human reviews final output. Used for well-defined tasks with programmatic verification.
Autonomous Bolt characteristics:
- Completion promise signal (e.g.,
COMPLETEorBLOCKED) - Maximum iteration limit as safety net
- Backpressure through tests, types, and linting
- File-based progress persistence
- Blocker documentation when stuck
- Human attention not required during execution
A Unit may be executed through one or more Bolts, which may run in parallel or sequentially. AI plans the Bolts; developers and Product Owners validate the plan.
Completion Criteria#
Completion Criteria are explicit, programmatically verifiable conditions that define when work is done. They enable autonomous execution by giving AI clear targets to iterate toward.
Completion Criteria should be:
| Attribute | Description | Example |
|---|---|---|
| Specific | Unambiguous and precise | "Coverage >80% for src/auth/" |
| Measurable | Quantifiable or binary | "Response time <200ms p95" |
| Verifiable | Can be checked programmatically | "TypeScript compiles with no errors" |
| Independent | Don't prescribe implementation | "Passwords are hashed" not "Use bcrypt" |
Completion Criteria can include:
- Test pass requirements
- Code quality thresholds
- Performance benchmarks
- Security scan results
- Documentation requirements
- Integration validation
Completion Announcements#
When an Intent completes, the work isn't done—it needs to be communicated. AI-DLC 2026 includes Completion Announcements as a first-class artifact.
During Mob Elaboration, teams specify announcement formats:
# intent.md frontmatter
announcements:
- changelog # Conventional changelog entry
- release-notes # User-facing summary
- social-posts # Twitter/LinkedIn ready
- blog-draft # Long-form announcement
On completion, AI generates each configured format:
| Format | Purpose | Audience |
|---|---|---|
| CHANGELOG | Technical record of changes | Developers |
| Release Notes | User-facing feature summary | End users |
| Social Posts | Platform-optimized snippets | Community |
| Blog Draft | Detailed announcement | Marketing |
Why this matters: Shipping features without communication is incomplete work. The announcement artifacts ensure every completed Intent has the materials needed for proper release communication, reducing the gap between "code complete" and "users know about it."
Deployment Unit#
Deployment Units are operational artifacts encompassing everything needed to run in production:
- Code artifacts: Container images, serverless function packages, compiled binaries
- Configuration: Helm charts, environment configs, feature flags
- Infrastructure: Terraform modules, CloudFormation stacks, Pulumi programs
- Validation: Test suites for functional, security, and performance verification
AI generates all associated tests, including:
- Functional acceptance tests validating business requirements
- Security scans (static analysis, dependency vulnerabilities, SAST/DAST)
- Performance and load tests against NFR thresholds
- Infrastructure validation and compliance checks
After human validation of test scenarios and cases, AI executes the test suites, analyzes results, and correlates failures with code changes, configurations, or dependencies. Deployment Units are tested for functional acceptance, security compliance, NFR adherence, and operational risk mitigation, ensuring readiness for seamless deployment.
Deployment Units should be independently deployable and include automated rollback procedures.
Phases & Rituals#
AI-DLC 2026 organizes work into three phases, each with distinct rituals and human-AI interaction patterns.
Inception Phase#
The Inception Phase focuses on capturing Intents and translating them into Units with clear Completion Criteria for development.
Mob Elaboration Ritual
The central ritual of Inception is Mob Elaboration—a collaborative requirements elaboration and decomposition session. This happens with stakeholders and AI working together, either in a shared room with a screen or via collaborative tools.
During Mob Elaboration:
- AI asks clarifying questions to minimize ambiguity in the original Intent
- AI elaborates the clarified intention into user stories, NFRs, and risk descriptions
- AI composes Units based on cohesion analysis, grouping highly related stories
- The team validates these artifacts, providing corrections and adjustments
- AI generates Completion Criteria for each Unit enabling autonomous execution
- The team validates criteria ensuring they're verifiable and complete
- AI recommends Bolt structure and mode (supervised vs autonomous) for each
Mob Elaboration condenses weeks of sequential work into hours while achieving deep alignment between stakeholders and AI.
Outputs of Inception:
- Well-defined Units with clear boundaries
- User stories with acceptance criteria
- Non-functional requirements (NFRs)
- Risk descriptions (aligned with organizational risk frameworks)
- Completion Criteria (tests, thresholds, checks)
- Measurement criteria traced to business intent
- Suggested Bolts with mode recommendations
Construction Phase#
The Construction Phase transforms Units into tested, deployment-ready artifacts through Bolts. This phase progresses through domain modeling, logical design, code generation, and testing—though these steps may be implicit rather than explicit depending on complexity.
Mode Selection
The first decision in Construction is mode selection for each Bolt:
| Choose Supervised (HITL) When... | Choose Autonomous (AHOTL) When... |
|---|---|
| Novel domain or architecture | Well-understood patterns |
| Complex trade-offs required | Clear acceptance criteria |
| High-risk changes | Programmatic verification available |
| Creative or UX-focused work | Mechanical transformations |
| Foundational decisions | Batch operations |
Supervised Construction (HITL)
For novel domains, architectural decisions, or high-judgment work:
- Developer establishes session with AI
- AI proposes domain model; developer validates business logic
- AI proposes logical architecture; developer makes trade-off decisions
- AI generates code; developer reviews for quality and alignment
- AI generates tests; developer validates scenarios and edge cases
- Iterate until acceptance criteria met with human checkpoint at each step
Autonomous Construction (AHOTL)
For well-defined tasks with programmatic verification:
Implement the user authentication API endpoints.
Context:
- Specs in specs/auth-api.md
- Existing middleware in src/middleware/
- Database models in src/models/user.ts
Completion Criteria:
- All tests in tests/auth/ pass
- TypeScript compiles with strict mode: no errors
- ESLint passes with 0 warnings
- Coverage >85% for src/auth/
- Security scan: no critical/high findings
- Load test: 100 concurrent requests in <2s
Process:
1. Read all relevant specs and existing code
2. Write tests first based on acceptance criteria
3. Implement to pass tests
4. Run full verification suite
5. Fix any failures
6. Iterate until all criteria pass
Constraints:
- Maximum 50 iterations
- Only modify files in src/auth/, tests/auth/
- Commit after each working increment
If blocked after 10 attempts on same issue:
- Document in .agent/blockers.md
- Output BLOCKED
When all criteria pass:
- Output COMPLETE
Human review of autonomous work:
- Reviews generated code for architectural alignment
- Validates design decisions (e.g., caching strategy)
- Checks for edge cases not covered by tests
- Approves for integration or requests changes
Mob Construction Ritual
For complex systems requiring multiple Units built in parallel:
- Teams collocate (physically or virtually) with AI agents
- Each team takes ownership of a Unit
- Teams exchange integration specifications (API contracts, event schemas)
- Autonomous Bolts execute in parallel within each Unit
- Human checkpoints coordinate cross-Unit concerns
- Integration testing validates boundaries between Units
Operations Phase#
The Operations Phase centers on deployment, observability, and maintenance of systems, leveraging AI for operational efficiency.
Deployment:
- AI packages modules into Deployment Units (containers, functions, IaC)
- Developers review deployment configuration
- AI executes deployment to staging environment
- Validation tests run against staging
- Human approves promotion to production
- AI deploys to production with rollback capability
Observability and Monitoring:
AI actively analyzes telemetry data:
- Metrics: CPU, memory, latency, error rates, business KPIs
- Logs: Error patterns, warning trends, anomaly detection
- Traces: Request flow analysis, bottleneck identification
Anomaly Response:
For well-defined operational scenarios with runbook coverage, AI operates autonomously (AHOTL for operations). Humans monitor dashboards, receive alerts for novel situations, and intervene for edge cases outside runbook coverage.
Autonomous Operations Boundaries Example:
autonomous_actions:
allowed:
- name: scale_horizontally
trigger: "cpu > 80% for 5m"
max_replicas: 10
cooldown: 10m
- name: restart_unhealthy
trigger: "health_check_failures > 3"
max_restarts: 3
cooldown: 10m
- name: rollback_deployment
trigger: "error_rate > 5% for 10m"
automatic: true
requires_human_approval:
- scale_vertically
- database_migration
- security_configuration_change
- delete_any_resource
- modify_network_rules
The Workflow#
The complete AI-DLC 2026 workflow integrates all phases into a continuous flow:
Key principles of the workflow:
- Each step enriches context for the next: Artifacts build progressively, with each stage producing semantically richer context
- Files are the memory: Progress persists in modified files and commits, not in conversation
- Checkpoints enable adjustment: Human review points can redirect work without restarting from scratch
- Flow replaces phases: Rather than discrete handoffs, work flows continuously with human oversight at strategic moments
- Traceability is maintained: All artifacts link together, enabling forward and backward traceability
AI-DLC 2026 in Action: Greenfield Development#
Scenario: Product Owner initiates development of a recommendation engine for cross-selling products.
Inception Phase (Mob Elaboration)#
Human provides Intent:
## Intent: Product Recommendation Engine
Build a recommendation engine that suggests complementary products
based on purchase history and browsing behavior.
### Initial Context
- E-commerce platform with 50,000 products
- 1 million monthly active users
- Need real-time recommendations (<100ms)
- Must integrate with existing product catalog API
- GDPR compliance required for EU users
AI asks clarifying questions:
"I'll help elaborate this Intent. Let me ask some clarifying questions:
- What recommendation approaches should we consider? Collaborative filtering, content-based, or hybrid?
- What's the expected personalization depth? User-specific recommendations or segment-based?
- Where should recommendations appear? Product pages, cart, checkout, email?
- What's the cold-start strategy for new users with no history?
- Are there any products that should never be recommended together?"
Team provides answers, AI elaborates into Units with dependencies:
unit-01-data-collection ──→ unit-02-model ──┐
├──→ unit-04-frontend
unit-03-api ─────────┘
unit-01-data-collection.md (depends_on: [])
Capture user behavior (views, clicks, purchases), build feature pipelines, create user and product embeddings.
Completion Criteria:
- Event tracking captures views, clicks, add-to-cart, purchases
- Events flow through pipeline with <30s latency
- User embeddings update nightly
- Data retention complies with GDPR (consent, deletion)
unit-02-model.md (depends_on: [unit-01-data-collection])
Train and serve ML model for product recommendations. Requires data pipeline from unit-01.
Completion Criteria:
- Model achieves >0.75 AUC on holdout set
- Inference latency <50ms p99
- A/B test framework integrated
- Model retraining pipeline automated
unit-03-api.md (depends_on: [])
Real-time serving API integrated with product catalog. Can start in parallel with unit-01.
Completion Criteria:
- GET /recommendations/{userId} returns top 10 products
- Response time <100ms p99 under 1000 req/s load
- Integration tests pass against catalog API
- Rate limiting: 1000 req/min/user
- Authentication via existing JWT tokens
unit-04-frontend.md (depends_on: [unit-02-model, unit-03-api])
Display recommendations in product pages and cart. Fan-in: requires both model and API.
Completion Criteria:
- Recommendation widget renders on product pages
- Widget handles loading, error, empty states
- Click tracking integrated with analytics
- A/B test variant support
- Accessibility: WCAG 2.1 AA compliant
Parallel execution: unit-01-data-collection and unit-03-api start immediately (no dependencies). unit-02-model waits for unit-01-data-collection. unit-04-frontend waits for both unit-02-model and unit-03-api (fan-in point).
AI recommends Bolt modes:
| Unit | Recommended Mode | Rationale |
|---|---|---|
| unit-01-data-collection | Autonomous (AHOTL) | Clear criteria, established patterns |
| unit-02-model | Supervised (HITL) | Novel ML decisions, trade-offs |
| unit-03-api | Autonomous (AHOTL) | Standard REST patterns, clear criteria |
| unit-04-frontend | Supervised (HITL) | UX decisions, accessibility judgment |
Construction Phase#
Unit 3 (API Integration) assigned for construction as Autonomous Bolt.
Rationale for autonomous mode:
- Clear completion criteria (response time, test coverage, security)
- Well-understood domain (REST API development)
- Strong programmatic verification (tests, load tests, security scan)
AI execution trace:
- Reads specs, existing code, understands context
- Writes API endpoint tests for /recommendations/{userId}
- Implements endpoint with basic recommendation logic
- Tests fail—catalog integration missing
- Implements catalog client integration
- Tests pass
- Load test fails at 800 req/s (target: 1000)
- Adds Redis caching layer
- Load test passes: 1200 req/s, 85ms p99
- Security scan: SQL injection risk in query param
- Fixes with parameterized queries
- All criteria met
- Outputs
COMPLETE
Human reviews autonomous output:
- Code review finds good architectural decisions
- Caching strategy appropriate for use case
- Test coverage comprehensive
- Approves for integration
Operations Phase#
Deployment:
- AI packages API service as container image
- Generates Kubernetes manifests with HPA configuration
- Creates Terraform for managed Redis cache
- Human reviews and approves deployment config
- Deploys to staging, runs integration tests
- Promotes to production with canary rollout
Autonomous monitoring configured:
autonomous_actions:
- alert: HighLatency
condition: p99_latency > 150ms for 5m
action: scale_replicas(current + 2)
max_replicas: 10
- alert: HighErrorRate
condition: error_rate > 2% for 5m
action: rollback_to_previous_version
- alert: RecommendationQualityDrop
condition: click_through_rate < baseline - 15%
action: notify_team # Requires human analysis
AI-DLC 2026 in Action: Brownfield Development#
Scenario: Adding a new wishlist feature to an existing e-commerce application.
Key Difference: Context Building#
Before Inception, AI must understand the existing codebase. This analysis can itself be an Autonomous Bolt:
Autonomous context building:
Analyze existing codebase and generate context models.
Tasks:
1. Identify major components in src/
2. Map dependencies between components
3. Document data flows for checkout and user profile
4. Identify integration points for new features
5. Note patterns and conventions used
Output:
- docs/architecture/static-model.md (components, responsibilities, relationships)
- docs/architecture/dynamic-model.md (key use case flows)
- docs/architecture/integration-points.md (where new features connect)
- docs/architecture/conventions.md (patterns to follow)
Output ANALYSIS_COMPLETE when done.
Static Model captures: components, descriptions, responsibilities, relationships Dynamic Model captures: how components interact for significant use cases
After context building, Inception proceeds normally but with awareness of existing patterns and constraints. The AI's proposals align with established conventions, and completion criteria reference existing test patterns.
Pre-Inception Artifacts#
AI can also query organizational memory during brownfield work:
- Query ticket history: "What past issues relate to user preferences?"
- Query ADRs: "What decisions were made about data storage patterns?"
- Query runbooks: "What operational considerations exist for user-facing features?"
This context informs the Inception phase, ensuring new work aligns with existing decisions and avoids repeating past mistakes.
Decision Framework: Selecting Operating Mode#
Selecting the right operating mode is critical to AI-DLC success. The decision depends on criteria clarity, verification method, risk level, domain familiarity, and whether subjective quality matters.
The core principle: Default to more supervision when uncertain. It's easier to loosen control than to recover from autonomous mistakes.
| Scenario | Mode | Rationale |
|---|---|---|
| Implement new algorithm | Supervised (HITL) | Novel, requires judgment |
| Add CRUD endpoints | Autonomous (AHOTL) | Well-understood pattern |
| Database schema migration | Supervised (HITL) | High risk, data integrity |
| UI component implementation | Observed (OHOTL) | Subjective design quality |
| Documentation update | Autonomous (AHOTL) | Clear criteria, low risk |
Work can transition between modes as understanding develops. The flexibility to switch modes mid-work is a key feature of AI-DLC 2026.
Implementing Autonomous Bolts#
The Ralph Wiggum pattern, named after the Simpsons character, embraces "deterministically bad in an undeterministic world." Rather than trying to be perfect, the agent tries, fails, learns from failures, and iterates until success.
Core components:
- Completion Promise: A signal (COMPLETE/BLOCKED) that autonomous execution has finished
- Backpressure: Quality gates that provide feedback on each iteration
- Iteration Limit: Safety net preventing infinite loops
- Blocker Documentation: Graceful degradation when stuck
The pattern has evolved from simple single-agent loops to hat-based orchestration for complex workflows—but the core philosophy remains: sophisticated behavior emerges from simple loops with good backpressure, not from complex multi-agent architectures.
Teams can customize hat behavior through configuration files, defining custom hats with specific triggers, events, and instructions tailored to their workflows.
Adoption Path#
AI-DLC adoption follows a progressive pattern: start with Mob Elaboration to build shared understanding, introduce Completion Criteria discipline, then gradually expand autonomous scope as confidence builds.
The key phases:
- Foundation: Introduce Completion Criteria on existing work, build quality gates
- First Autonomous Work: Low-risk, well-defined tasks with conservative limits
- Scaling: Expand to more task types, develop team-specific patterns
Skills evolve from writing code to defining success criteria, from reviewing PRs to designing quality gates, from manual testing to test specification.
Measuring AI-DLC Effectiveness#
AI-DLC shifts from activity-based metrics (lines of code, story points) to outcome-based metrics (value delivered, criteria coverage, time-to-value). The best teams measure what matters to the business, not what's easy to count.
Core metrics to track:
| Metric | What It Measures |
|---|---|
| Cycle Time | Time from Intent to Production |
| Bolt Success Rate | Bolts completing without human rescue |
| Churn Rate | Iterations per bolt (high = poor criteria) |
| Criteria Escape Rate | Defects found post-deployment |
Compliance and Audit Integration#
AI-DLC integrates with regulated environments through structured audit patterns that maintain compliance without sacrificing speed. Two primary patterns:
- Auditable Checkpoints: Human audit points at strategic boundaries (requirements sign-off, design approval, code review, release approval) with autonomous work between them
- Automated Compliance Gates: Quality gates as compliance verification with audit logging
The key insight: audit trails emerge naturally from the AI-DLC workflow rather than being bolted on afterward. The workflow is the audit trail.
Glossary#
| Term | Definition |
|---|---|
| Backpressure | Quality gates that automatically reject work not meeting criteria, providing feedback for iteration |
| Bolt | Smallest iteration unit in AI-DLC 2026; operates in supervised (HITL), observed (OHOTL), or autonomous (AHOTL) mode |
| Completion Criteria | Programmatically verifiable conditions that define when work is successfully done |
| Completion Promise | Signal (e.g., COMPLETE, BLOCKED) that autonomous execution has finished |
| Context Budget | Available attention capacity in AI context window; quality degrades when overloaded |
| HITL | Human-in-the-Loop: human validates each significant step before AI proceeds; used for novel, high-risk, or foundational work |
| AHOTL | Autonomous Human-on-the-Loop: human defines criteria and reviews output; AI operates autonomously within boundaries; used for well-defined, programmatically verifiable work |
| Integrator | Final validation hat that runs conditionally based on VCS strategy; validates auto-merged state (trunk) or creates single PR (intent); skipped for unit/bolt strategies |
| Intent | High-level statement of purpose with completion criteria that serves as starting point for decomposition |
| Memory Provider | Source of persistent context (files, git, tickets, ADRs, runbooks) accessible to AI agents |
| Mob Elaboration | Collaborative ritual where humans and AI decompose Intent into Units with Completion Criteria |
| Mob Construction | Collaborative ritual where multiple teams build Units in parallel with AI assistance |
| OHOTL | Observed Human-on-the-Loop: human watches AI work in real-time with ability to intervene; synchronous awareness with asynchronous control; used for creative, subjective, or training scenarios |
| Quality Gate | Automated check (tests, types, lint, security) that provides pass/fail feedback |
| Ralph Wiggum Pattern | Autonomous loop methodology: try, fail, learn, iterate until success criteria met |
| Unit | Cohesive, independently deployable work element derived from an Intent; named with numerical prefix + slug (e.g., unit-01-setup-auth); can declare dependencies via depends_on forming a DAG |
| Unit DAG | Directed Acyclic Graph of unit dependencies enabling parallel execution (fan-out) and convergence (fan-in) |
References#
-
Raja SP. AI-Driven Development Lifecycle (AI-DLC) Method Definition. Amazon Web Services, July 2025. https://aws.amazon.com/blogs/devops/ai-driven-development-life-cycle/
-
Geoffrey Huntley. Ralph Wiggum Software Development Technique. 2025. https://ghuntley.com/ralph/
-
Anthropic. Ralph Wiggum Plugin for Claude Code. 2025.
-
Steve Wilson. Human-on-the-Loop: The New AI Control Model That Actually Works. The New Stack, August 2025.
-
paddo.dev. The SDLC Is Collapsing. 2025.
-
paddo.dev. The 19-Agent Trap. January 2026.
-
HumanLayer. 12 Factor Agents. 2025.
-
Anthropic. Claude Code: Best Practices for Agentic Coding. 2025.
-
Model Context Protocol (MCP). Specification and Server Implementations. 2025.
-
Karpathy, Andrej. LLMs in Software Development. 2025.
AI-DLC 2026 is an open methodology. Contributions and adaptations are welcome.
See AI-DLC in Action
Explore the interactive tools to understand and apply the AI-DLC methodology:
- Big Picture Diagram - Visual overview of all concepts and their relationships
- Workflow Visualizer - Step through hat-based workflows interactively
- Mode Selector Tool - Find the right operating mode for your task
- Glossary - Quick reference for all AI-DLC terminology