Understanding Context Windows: The Golden Rule of Prompting

Hướng dẫn chi tiết về Understanding Context Windows: The Golden Rule of Prompting trong Vibe Coding dành cho None.

The Forgetful Genius: Mastering the Context Window

Imagine you’ve just hired the most brilliant architect in the world. This architect has read every book on structural engineering, knows every building code in existence, and can design a skyscraper in their sleep. There is, however, one catch: their desk is only the size of a postage stamp.

Every time you give them a new blueprint, they have to shove a piece of the old one off the edge of the desk to make room. If you keep talking and handing them papers, they eventually forget what the foundation of the building looked like because that part of the plan has fallen onto the floor and been swept away.

In the world of AI-assisted development—or what we call Vibe Coding—this desk is the “Context Window.” Understanding how it works isn’t just a technical detail; it is the single most important “Golden Rule” for anyone who wants to build software with AI without losing their mind.

The Working Memory of the Machine

At its simplest level, the context window is the “working memory” of a Large Language Model (LLM). When you interact with an AI like Gemini or Claude, the model doesn’t “remember” you from one day to the next in the way a human does. Instead, every time you send a message, the entire history of that specific conversation is bundled up and sent to the model all over again.

The AI looks at that entire bundle, processes it, and predicts what the next words should be. The “Context Window” is the limit on how big that bundle can be.

Tokens: The Currency of Context

To understand the window, you have to understand tokens. AI doesn’t read words; it reads chunks of characters. For example, the word “apple” might be one token, but a complex word like “tokenization” might be split into three: “token”, “iz”, and “ation”.

A good rule of thumb for English text is that 1,000 tokens is roughly equal to 750 words.

When you hear that a model has a “128k context window,” it means it can hold about 128,000 tokens in its active memory at once. That sounds like a lot—it’s roughly the length of a 400-page novel. But in coding, context disappears faster than you think. Every file you read, every error log you paste, and every long-winded explanation you provide eats into that limit.

Why This Matters for Vibe Coding

Vibe Coding is the art of describing intent and letting the AI handle the implementation. But for the AI to implement your intent correctly, it needs to see the current state of your code. If you are building a React application and you ask the AI to “add a login button,” it needs to know:

  1. What does your current Header.tsx look like?
  2. Which styling library are you using?
  3. How is your authentication state managed?

If that information isn’t in the current context window, the AI will guess. It might suggest using Tailwind CSS when you’re actually using Vanilla CSS. It might suggest a library you haven’t installed. This is where “hallucinations” often come from—not from the AI being “stupid,” but from the AI being “blind” to the parts of your project that fell off the desk.

The “Lost in the Middle” Phenomenon

One of the most critical beginner mistakes is assuming that as long as the window is large, the AI is perfectly accurate. Research has shown a phenomenon called “Lost in the Middle.”

LLMs are generally very good at remembering the very beginning of a prompt (your initial instructions) and the very end of a prompt (your most recent request). However, their accuracy drops significantly when the most important information is buried in the middle of a massive context.

If you paste 50 files into a single chat, the AI might correctly follow your latest command, but it might “forget” a crucial architectural constraint you mentioned in the 10th file you uploaded.

A Practical Example: Bloat vs. Precision

Let’s look at a real-world scenario. You are working on a dashboard for a fitness app. You want to change the color of the “Calories Burned” chart.

The “Wrong” Way (Context Bloat)

A beginner might say: “Here is my entire src folder. Please find the chart and change the calories color to red.”

The AI then receives:

  • App.tsx (500 tokens)
  • UserSidebar.tsx (800 tokens)
  • GlobalStyles.css (2,000 tokens)
  • DatabaseSchema.sql (3,000 tokens)
  • ChartComponent.tsx (1,200 tokens)
  • …and 10 other unrelated files.

Total context used: 15,000 tokens. The AI is now overwhelmed with information about database tables and sidebar navigation that has nothing to do with a chart color. It is more likely to make a mistake, and you are wasting “space” for future messages in the session.

The Vibe Coding Way (Surgical Precision)

An expert Vibe Coder uses tools to provide only what is necessary. Using a CLI agent, the workflow looks like this:

  1. Search: grep_search("Calories") to find which file contains the logic.
  2. Targeted Read: read_file("src/components/StatsChart.tsx", start_line=45, end_line=90) to see only the relevant code block.
  3. The Prompt: “In StatsChart.tsx, I see the dataSeries object on line 62. Change the stroke color for the ‘calories’ key to #FF0000.”

Total context used: 300 tokens. The AI is now 100% focused. There is zero noise. The accuracy will be near-perfect because the “desk” is clean.

The Golden Rule of Prompting

The Golden Rule is simple: Provide the minimum amount of context required to make the next decision unambiguous.

Think of yourself as a director and the AI as your lead actor. If you give the actor a 500-page script for a 30-second scene, they might get the tone wrong. If you give them just the two pages they need for that scene, they can focus all their energy on the performance.

Best Practices for Managing Your Window

To become a master of Vibe Coding, you need to develop “Context Hygiene.” Here are the strategies used by professional AI engineers:

1. The “Surgical” Read

Never read an entire file if you only need one function. Most modern AI CLI tools (like the one you’re using now) allow you to specify line ranges. If you know the bug is in the handleSubmit function, don’t provide the imports, the types, and the footer of the file. Just provide the function.

2. The “Fresh Start” Principle

Context windows accumulate “rot.” If you’ve been debugging a single issue for 20 turns, the history is filled with failed attempts, error messages, and confusing back-and-forth. The AI might start getting confused by its own previous mistakes.

When this happens, start a new session. Summarize the current state: “We are working on the Login feature. The API is working, and the state is updating, but the redirect isn’t triggering. Here is the current AuthContext.tsx.” This clears the “desk” of all the failed attempts and gives the AI a clean slate.

3. Use a “Source of Truth” File

In complex projects, it helps to keep a small file (like TECH_STACK.md or ARCHITECTURE.md) that describes the high-level rules of your project.

  • “We use TypeScript.”
  • “We use functional components, not classes.”
  • “We use Prettier for formatting.”

By including this tiny file in your context, you prevent the AI from suggesting “out-of-vibe” code.

4. Task Decomposition

Don’t ask the AI to “Build a whole e-commerce site.” That requires too much context at once. Instead:

  • “Scaffold the project structure.” (Task 1)
  • “Implement the Product Card component.” (Task 2)
  • “Create the Shopping Cart logic.” (Task 3)

By breaking the project into atomic tasks, you ensure that the context for each step is small and highly relevant.

5. Managing Large Data

If you have a massive log file or a 10,000-line JSON file, don’t paste it. Use tools like grep or search to extract the specific lines where the error occurs. Pasting a 5,000-line log file is the fastest way to “poison” your context window and make the AI hallucinate.

The Cost of Efficiency

In many AI systems, you are charged by the token. While the “intellectual” cost of context bloat is accuracy, the “financial” cost is real money. Even if you are using a flat-rate subscription, larger contexts take longer to process. A lean prompt might get a response in 2 seconds; a bloated prompt might take 15 seconds.

In Vibe Coding, momentum is everything. Slow responses break your “flow state.” Keeping your context window clean isn’t just about accuracy—it’s about staying in the zone.

Conclusion: Respect the Desk

The Context Window is not just a technical limitation; it is the boundary of the AI’s “thought.” When the AI fails, it is rarely because it doesn’t know how to code—it’s because it doesn’t know what it is coding.

By respecting the “Golden Rule”—providing the minimum context needed for an unambiguous decision—you turn the AI from a forgetful genius into a precision instrument.

Next time you’re about to paste a massive block of code or a giant error log, ask yourself: “Does the AI really need to see line 1 through 500 to fix line 25?” If the answer is no, trim it down. Your code, your wallet, and your sanity will thank you.

Happy Vibe Coding!