Mastering `cm-tdd`: The Complete Guide

The high of “Vibe Coding” is addictive. You’re in the flow, the AI is streaming code at 400 tokens per second, and a complex feature that would normally take a week is materializing before your eyes in minutes. It feels like magic—until it doesn’t.

We’ve all been there: you ask for a refactor, the AI enthusiastically complies, and suddenly, the “vibe” is dead. A dozen silent regressions have crept into your codebase. You spend the next three hours in a “debugging loop of doom,” playing whack-a-mole with errors that the AI keeps trying to fix but only makes worse. This is the “fragility trap” of AI-assisted development.

cm-tdd is the antidote. As one of the core skills in the Cody Master toolkit, cm-tdd (Test-Driven Development) transforms the way you interact with AI. It shifts the paradigm from “prompt-and-pray” to “define-and-verify.” In this guide, we will explore why cm-tdd is the ultimate quality gate for Vibe Coding and how to master its workflow to build production-grade software that actually stays working.

Core Concepts: Why TDD is Different for AI

In traditional software engineering, TDD is often seen as a discipline for humans to prevent logic errors and force better architecture. When we move to an AI-agentic world, the role of TDD evolves. For an AI agent, a test is not just a verification step; it is a hard constraint on reality.

1. Tests as a High-Signal Specification

Standard prompts are often ambiguous. When you say “make it handle errors gracefully,” the AI has to guess what “gracefully” means to you. When you use cm-tdd, you are forced to write a test that asserts exactly what happens when an error occurs. You are providing the AI with a “Definition of Done” that it can check itself against without needing to ask you for clarification.

2. Hallucination Protection

AI agents are prone to “hallucinating” that a fix works because the code looks correct. cm-tdd breaks this illusion. By requiring a failing test first (the Red phase), we prove that the bug exists. By requiring the test to pass (the Green phase), we provide empirical evidence that the logic is correct.

3. The “Refactor” Safety Net

Vibe Coding often leads to “spaghetti AI code”—functional but unmaintainable. cm-tdd allows you to ask the agent to “clean this up” or “make it more idiomatic” with 100% confidence. If the agent breaks the logic during the refactor, the tests will catch it immediately, preventing the session from spiraling into a broken state.

How it Works: The `cm-tdd` Lifecycle

The cm-tdd skill follows the classic Red-Green-Refactor cycle, but optimized for the Gemini CLI and autonomous agent workflows.

Phase 1: Red (Verifiable Failure)

You never start by writing implementation code. Instead, you describe the behavior you want. The agent, guided by the cm-tdd skill, generates a test file that specifically targets this behavior.

The Goal: Run the test and see it fail.
Why it Matters: This confirms that the test is actually testing what you think it is, and that the feature isn’t already “working” by accident or via some unrelated side effect.

Phase 2: Green (The Minimalist Path)

The agent is then tasked with writing the minimal amount of code necessary to make that test pass. In Vibe Coding, we often want the agent to be clever, but in the Green phase, we want it to be accurate.

The Goal: Get the checkmark.
Why it Matters: It establishes a baseline of correctness. Once the test is green, you have a “save point” in your development journey.

Phase 3: Refactor (Architectural Integrity)

Now that the logic is proven, the agent can optimize. This is where you apply design patterns, improve variable naming, and ensure the code aligns with your project’s STYLE.md.

The Goal: Maintain the green state while improving code quality.
Why it Matters: This prevents technical debt from accumulating during rapid AI generation.

Practical Example: Building a “Smart Slugifier”

Let’s walk through a real-world scenario. Suppose we need a utility function that converts article titles into URL-friendly slugs. It needs to handle special characters, multiple languages, and avoid trailing hyphens.

Step 1: Initiating the Skill

You start by telling the agent your intent. Instead of saying “write a slugify function,” you say:

“I need to implement a slugify utility in src/utils/string.ts. Use cm-tdd to ensure it handles emojis, accented characters, and prevents double hyphens.”

Step 2: The Red Phase

The agent activates the skill and creates a test file (e.g., src/utils/string.test.ts). It writes a test suite like this:

// src/utils/string.test.ts
import { describe, it, expect } from 'vitest';
import { slugify } from './string';

describe('slugify', () => {
  it('should convert simple strings to lowercase slugs', () => {
    expect(slugify('Hello World')).toBe('hello-world');
  });

  it('should handle special characters and emojis', () => {
    expect(slugify('Vibe Coding 🚀 is Awesome!')).toBe('vibe-coding-is-awesome');
  });

  it('should handle accented characters', () => {
    expect(slugify('Café and Resumé')).toBe('cafe-and-resume');
  });

  it('should prevent multiple hyphens', () => {
    expect(slugify('Hello --- World')).toBe('hello-world');
  });
});

The agent runs the tests using your project’s test runner. Since slugify doesn’t exist yet, the tests fail. This is the Red phase completed.

Step 3: The Green Phase

The agent now implements the function in src/utils/string.ts. It might start with a simple regex-based approach:

// src/utils/string.ts
export function slugify(text: string): string {
  return text
    .toLowerCase()
    .normalize('NFD') // Handle accents
    .replace(/[\u0300-\u036f]/g, '')
    .replace(/[^a-z0-9]+/g, '-') // Replace non-alphanumeric with -
    .replace(/^-+|-+$/g, ''); // Trim hyphens from start/end
}

The agent runs the tests again. They pass. We are now in the Green phase.

Step 4: The Refactor Phase

Now, you might notice the regex is a bit hard to read, or you want to add a check for string length. You instruct the agent:

“The tests are passing. Now refactor the slugify function to be more readable and add a 50-character limit.”

The agent updates the code. Because the tests are running in the background (or being triggered by the cm-tdd validation gate), you know instantly if the refactor broke the emoji handling or the accent normalization.

Best Practices & Pro Tips

To get the most out of cm-tdd, you should treat it as a lifestyle, not just a tool.

1. Combine with `cm-planning`

Never start TDD without a plan. Use activate_skill("cm-planning") first to map out the edge cases. Ask the agent to “list all the ways this feature could fail” and then turn those into test cases. This prevents you from missing the “hidden” bugs that AI often overlooks.

2. Keep Tests Granular

AI agents perform better when tasks are broken down. Instead of one giant test file, create small, focused tests for specific logic paths. If a test fails, the agent can pinpoint the exact line of code that caused the regression, rather than guessing across a 500-line file.

3. Use “Evidence Before Assertions”

When the agent claims a bug is fixed, do not take its word for it. The cm-tdd protocol requires the agent to show you the terminal output of the passing tests. If an agent tries to skip this step, remind it: “I need to see the test output before we proceed.”

4. Mocking and External Services

In Vibe Coding, we often interact with APIs (Stripe, OpenAI, Supabase). Mastering cm-tdd means teaching the agent how to write effective mocks. If you are building a payment flow, the test shouldn’t hit the real Stripe API. Instruct the agent to “Mock the Stripe SDK responses using Vitest vi.fn()” to ensure your tests are fast and deterministic.

5. The “Golden Rule” of Refactoring

If you are refactoring code that doesn’t have tests yet, the first step is always to write the tests for the existing behavior. This is often called “Characterization Testing.” Tell the agent: “Before we refactor this legacy file, use cm-tdd to capture its current behavior in a test suite.”

Conclusion: From Vibe to Velocity

Vibe Coding is about speed, but true speed is impossible without stability. If you are constantly looking over your shoulder to see what the AI broke, you aren’t actually moving fast—you’re just creating a bigger mess to clean up later.

By mastering cm-tdd, you turn your AI agent into a rigorous engineering partner. You provide it with the guardrails it needs to succeed, and you give yourself the peace of mind to keep building. You move from “hoping it works” to “knowing it works.”

Next time you start a new feature, resist the urge to just “vibe” the code into existence. Call cm-tdd, write the failing test, and watch how much more powerful your AI assistant becomes when it has a clear, verifiable target to hit. That is the secret to scaling your productivity without sacrificing your sanity.

Happy Vibe Coding—and keep those tests green!

Mastering cm-tdd: The Complete Guide