Test-Driven Development in the Age of AI: The cm-tdd Manifesto

The promise of “Vibe Coding” is intoxicating. We describe a vision, an agent spins up a workspace, and seconds later, a functional application appears. It feels like magic—until the first bug hits. In the high-velocity world of AI-assisted development, we have traded the “Slow Coding” bottleneck for the “Debugging Doom Loop.” You know the cycle: the AI writes a feature, it looks correct, you run it, it fails, you ask the AI to fix it, it introduces two more bugs, and suddenly you are spending four hours triaging 500 lines of generated spaghetti that you didn’t even write.

The solution isn’t to slow down. It is to change the “Rails.”

In the Cody Master ecosystem, we utilize a specialized skill called cm-tdd. This isn’t your grandfather’s Test-Driven Development. This is Agentic TDD—a methodology designed to provide the guardrails necessary for AI to operate at 10x speed without sacrificing 1x reliability.

The Core Problem: The Inversion of Intent

Traditional coding follows a simple path: Think -> Write -> Debug. AI coding (Vibe Coding) follows: Describe -> Generate -> Hope.

The “Hope” phase is the vulnerability. When you ask an AI to “build a secure authentication system,” you are delegating both the logic and the verification to a probabilistic model. If the model is 95% accurate, that 5% error margin compounds with every new feature until the system is a house of cards.

cm-tdd solves this by inverting the relationship. You don’t ask the AI to build the feature; you ask the AI to build the test that defines the feature’s success, and then you force the AI to satisfy that test. By doing this, you move from being a Writer of code to being an Architect of Requirements.

How it Works: The Red-Green-Refactor-AI Loop

The standard TDD cycle is Red (Fail), Green (Pass), Refactor (Clean). In the age of AI, this cycle becomes an automated handshake between the Human Architect and the AI Agent.

1. The Specification as a Contract

Instead of a vague prompt, you provide a “Technical Intent.” You define the input/output boundaries and the edge cases. The cm-tdd skill starts by generating a suite of failing tests that represent the “Definition of Done.” If the tests don’t fail initially (Red), the test suite is invalid.

2. The Agentic Implementation

Once the tests are locked, the AI implementation phase begins. The agent is strictly forbidden from “cleaning up” outside code or adding “just-in-case” features. Its only objective is to make the tests turn Green. This constraint prevents the “Feature Creep” that often leads AI to hallucinate unnecessary dependencies.

3. The Verification Gate

In cm-tdd, a feature is not “done” when the code is written. It is done when the verification-before-completion protocol passes. This requires running the full test suite and generating an evidence log. If the tests pass but the logic feels “off,” we move to step 4.

4. The AI-Led Refactor

With a passing test suite as a safety net, we can now ask the AI to optimize for performance, readability, or memory safety. Because we have the tests, we can refactor with 100% confidence that we haven’t broken the original intent.

Practical Example: Building an Idempotent Payment Processor

Let’s look at a real-world scenario where Vibe Coding usually fails: Distributed Idempotency.

The Problem: You need a service that processes payments. If the network drops and the client retries, the user must not be charged twice.

Step 1: Defining the Red Phase (The Contract)

We invoke cm-tdd and provide the following intent: “Create a PaymentProcessor that handles Stripe charges. It must be idempotent using a unique requestId. If a duplicate requestId is received, return the original transaction result without hitting Stripe again.”

The AI generates the test first:

// test/payment.test.ts
describe('PaymentProcessor Idempotency', () => {
  it('should not charge the user twice for the same requestId', async () => {
    const processor = new PaymentProcessor(mockStripe);
    const payload = { amount: 1000, currency: 'usd', requestId: 'req_123' };

    // First call
    const firstResult = await processor.process(payload);
    expect(firstResult.status).toBe('succeeded');
    expect(mockStripe.charges.create).toHaveBeenCalledTimes(1);

    // Immediate retry
    const secondResult = await processor.process(payload);
    expect(secondResult.status).toBe('succeeded');
    expect(secondResult.transactionId).toBe(firstResult.transactionId);
    
    // CRITICAL: Stripe should still only have been called ONCE
    expect(mockStripe.charges.create).toHaveBeenCalledTimes(1);
  });
});

Step 2: The Failure (The “Red”)

We run the test. It fails because PaymentProcessor doesn’t exist. This confirms our “baseline of truth.”

Step 3: The AI Implementation (The “Green”)

We direct the agent: “Implement the code to pass this test. Use a Redis-backed cache for idempotency keys.”

The agent generates the implementation. Because it is constrained by the test, it doesn’t just write a wrapper for Stripe; it is forced to implement the Redis logic correctly to avoid the toHaveBeenCalledTimes(2) failure.

Step 4: Verification

The cm-tdd skill executes the test. If it passes, the agent provides the evidence: PASS test/payment.test.ts (1.2s) - Evidence: mockStripe.charges.create called 1 time for 2 requests.

Best Practices for Agentic TDD

To master cm-tdd, you must shift your mindset from “How do I code this?” to “How do I prove this works?“

1. Test the “Un-Happy Path” First

AI is great at the “Happy Path” (the standard use case). It is terrible at edge cases. When using cm-tdd, explicitly ask the agent to generate tests for:

Network timeouts during database writes.
Malformed JSON inputs.
Rate-limit breaches (429 errors).
Race conditions in async loops.

By forcing the AI to write tests for these before coding, you ensure the implementation handles them by design rather than as an afterthought.

2. Utilize “Property-Based Testing”

For advanced users, move beyond unit tests to Property-Based Testing (using libraries like fast-check). Ask the AI: “Generate 1000 random inputs for this price calculator and ensure the result never exceeds the maximum allowable discount.” This finds the “Black Swan” bugs that a human-written test would miss.

3. Mock the “World,” Not the “Logic”

When the AI generates tests, ensure it mocks external APIs (Stripe, Twilio, OpenAI) but not the internal business logic. If you mock the logic you’re trying to test, you’re just testing the mock. cm-tdd enforces strict boundaries here.

4. Small Cycles are King

Do not try to TDD an entire application in one go. Break your features into 15-minute cycles.

Cycle A: Database schema + migrations.
Cycle B: Repository layer + CRUD tests.
Cycle C: Service layer + Business logic.
Cycle D: API Controller + Integration tests.

This prevents the context window from becoming “poisoned” with too much speculative code.

Why This Solves the Vibe Coding Problem

The fundamental risk of AI coding is Erosion of Quality. As you add more AI-generated code, the “Technical Debt” grows exponentially because no single human understands every line.

cm-tdd acts as a “Quality Anchor.”

Documentation as Code: The test suite becomes the most accurate documentation of how the system works.
Safety for Refactoring: Six months from now, when you want to upgrade your tech stack, you can ask an AI to “Migrate this from Express to Fastify.” Without cm-tdd, this is suicide. With it, you just run the tests. If they are Green, the migration is successful.
Human-Level Oversight: You stop reading every line of generated code and start reviewing the logic of the tests. It is much easier to verify that a test is testing the right thing than to verify that 1,000 lines of logic are bug-free.

Conclusion: The Rise of the Verification Engineer

In the age of AI, the role of the “Software Engineer” is splitting in two. There are those who will be replaced by AI—those who simply “write code”—and there are Verification Engineers—those who use tools like cm-tdd to orchestrate, validate, and govern AI systems.

By adopting Test-Driven Development as your primary workflow in the Vibe Coding era, you aren’t just writing better code; you are building a resilient, scalable, and understandable system that can survive the chaos of automated generation.

Don’t just vibe. Verify.

This article is part of the Cody Master Best Practices series. To activate the TDD workflow in your local environment, use the command activate_skill cm-tdd.