ralph-loop

Coffee Codex: Spec → Plan → Implement — From Idea to Execution

Helmi Khaled

05 Apr 2026 • 7 min read

Building Coffee Codex through a structured AI-driven delivery loop

In the last post, I said something that sounds simple, almost trivial:

Code is cheap. Thinking is the bottleneck.
And the loop is straightforward — spec → plan → implement.

This post is what happens when you stop agreeing with that idea intellectually and actually run it, end to end, across real features, real repositories and real constraints.

Because this is where most of the illusion breaks.

Where AI Actually Fails

There’s a common narrative right now that AI isn’t “there yet” for serious engineering work. That it still needs better models, better tools, better context.

That’s not entirely wrong.

But it’s also not the real problem.

The real problem is this: most teams were already operating with vague specs, implicit assumptions and fragmented ownership long before AI showed up. The difference now is that AI doesn’t tolerate that ambiguity the way humans do. It doesn’t pause, reinterpret, or compensate socially. It executes.

And when you give it something unclear, it doesn’t slow down — it accelerates in the wrong direction.

So when people say AI fails at implementation, what I see instead is failure at intent definition. The model is not confused. The system around it is.

The AI Delivery Loop (As It Actually Runs)

Let’s make this concrete.

In Coffee Codex, every feature starts as a spec. Not a vague description, not a Jira ticket, but a structured prd.md inside a numbered spec folder.

Something like this:

Spec	Feature
001	App Shell
002	Recipe Listing
003	Recipe Detail
004+	(Upcoming: filters, search, auth, etc.)

The first three are fully executed using the loop.

Everything else is waiting behind the same system.

That structure is intentional. It forces the work to move in layers, not chaos.

The Guardrails (Before Any AI Runs)

Before even touching prompts, there are three documents that define the system:

vision.md → what we are trying to achieve
architecture.md → how the system is structured
design.md → how it should behave and feel

These are not documentation for humans.

They are constraints for AI.

They define:

boundaries
expectations
non-negotiables

Without them, AI is guessing.

With them, AI is operating inside a system.

Step 1 — Spec → Plan

This is the first transformation: intent into structure. At this stage, the goal is not progress. It is alignment.

The prompt is simple, but very deliberate:

Read docs/vision.md, docs/architecture.md and docs/specs/003-recipe-detail/prd.md.

Before making any changes, read the following documents:

- docs/vision.md
- docs/architecture.md
- docs/design.md

Generate a detailed implementation plan for this feature.

Write the plan to:

docs/specs/003-recipe-detail/plan.md

Do not implement code yet.

The prompt is intentionally restrictive. It forces the system to read vision, architecture and design before doing anything else, and then produce a plan without touching implementation. That constraint alone changes the behavior significantly.

Because what you are doing here is not asking AI to “figure it out.” You are forcing it to operate within a pre-defined mental model of the system.

And more importantly, you are giving yourself a moment to review the thinking before it hardens into code. This is your architectural review point.

Most teams skip this because it feels slower. In reality, this is the point where you either maintain control of the system or quietly lose it.

Step 2 - Plan → Tasks

This is where the quality of thinking becomes visible. Once the plan is accepted, we break it down:

Read docs/specs/003-recipe-detail/prd.md and docs/specs/003-recipe-detail/plan.md.

Before making any changes, read the following documents:

- docs/vision.md
- docs/architecture.md
- docs/design.md

Break the implementation plan into atomic engineering tasks.

Write them to:

docs/specs/003-recipe-detail/tasks.md

Tasks must be sequential and executable.

Breaking a plan into “atomic, sequential, executable” tasks sounds procedural, almost mechanical. But it is anything but trivial. If a task still requires interpretation, you have already introduced ambiguity. And ambiguity is exactly where AI starts to improvise.

What you want instead is something closer to determinism — a sequence where each step is so clear that execution becomes almost inevitable.

At this point, the role of the engineer shifts. You are no longer primarily writing code. You are designing clarity.

Step 3 - Implement

By the time you reach implementation, the interesting work should already be done.

The execution prompt is strict for a reason: follow architecture, follow design, follow vision, do not skip steps, do not introduce new abstractions casually, stop when something is unclear.

Execute tasks in docs/specs/003-recipe-detail/tasks.md one by one.

Before making any changes, read the following documents:

- docs/vision.md
- docs/architecture.md
- docs/design.md

Execution rules:

1. Execute tasks sequentially.
2. After completing each task, update tasks.md.
3. Do not skip tasks.
4. Do not change architecture.
5. Follow design and vision strictly.
6. Modify only necessary files.
7. Stop if unclear.

Constraints:

- Maintain clean architecture boundaries
- Follow domain models and API contracts
- Avoid unnecessary abstractions

This is not about limiting AI. It is about preventing drift before it compounds.

Because left unchecked, AI will optimize locally. It will produce something that works, looks reasonable and passes superficial checks — while slowly diverging from the system you are trying to build.

Constraints are not overhead. They are the only thing keeping the system coherent.

Two Reviews That You Cannot Skip

There are exactly two moments where human judgment is required, and both of them matter more now than before.

The first is the plan review. This is where you validate whether the problem is understood correctly and whether the approach respects the system’s boundaries.

The second is the implementation review. This is where you check whether execution stayed aligned with that intent.

Everything else can be accelerated. These two cannot be delegated.

If you skip them, you are not moving faster. You are just compressing the time it takes to make a mistake.

What This Actually Produced

I ran this flow across the first three specs, spanning two repositories. Not as a demo, but as actual implementation work.

What came out of it was not perfection. There were still gaps, still adjustments, still moments where the system had to be corrected.

But the difference was in the shape of the work:

Progress became structured, decisions became visible and rework — when it happened — stayed local.

There was very little of the usual “we need to rethink this entirely” that tends to appear late in traditional workflows.

And that, more than anything, is the signal that the loop is working.

AI-Driven Development — Spec → Plan → Implement → Deploy with human checkpoints

The Shift That Changes Team Dynamics

Here is the part that people are starting to notice, but not yet saying clearly.

The level of output produced by this approach used to require a team — not just any team, but one with strong senior engineers, tight alignment and a lot of implicit coordination.

That constraint is no longer the same.

Today, a small number of highly capable engineers, working with clear intent and strong architectural discipline, can produce what previously required an entire team.

Not by working harder, but by removing the friction that used to require coordination to overcome.

Not because AI replaces engineers, but because it removes the need to compensate for weak structure, unclear thinking and coordination overhead.

Which leads to a more uncomfortable question.

The Owner Mindset

If the cost of execution drops, then the bottleneck shifts even more aggressively to thinking, judgment and ownership.

At that point, questions about team size or velocity start to lose meaning. Because when execution is cheap, what you are really evaluating is judgment.

The more relevant question becomes personal:

If you owned the company, would you hire yourself for the work you are doing today?

Not in comparison to your colleagues. Not relative to the market.

Just you, as you are. If your own money, your own time and your own risk depended on the outcome.

Would you trust your own specifications?
Would you rely on your own plans to guide a system?
Would you be comfortable with the decisions you make under ambiguity?
Would you pay for your level of clarity and accountability?

If the answer is yes, then you are already operating at the level this shift demands.

If the answer is no, that is not something to defend against. It is something to pay attention to.

Because that gap — between how you work today and what you would demand as an owner — is not a weakness. It is the clearest signal you have on where to grow.

What This Means Going Forward

This is not about reducing teams for the sake of it. It is not about declaring that fewer people are always better.

It is about recognizing that the leverage has changed.

Execution is no longer the scarce resource it used to be. Clear thinking is. Structured planning is. Ownership is.

In this environment, average performance becomes harder to hide, because the system amplifies both strengths and weaknesses. Strong engineers become disproportionately more effective. Weak structure becomes immediately visible.

And the organizations that adapt are not the ones that simply “adopt AI,” but the ones that raise their standards for how work is defined, reviewed and owned.

Closing

The loop has not changed:

spec → plan → implement

But the consequences of doing it poorly — or well — have become much sharper.

AI does not fix unclear thinking. It exposes it.
It does not replace ownership. It demands it.

And perhaps most importantly, it removes a layer of excuse that many of us have relied on for a long time.

Because now, the question is no longer whether we have enough time, enough people, or enough tools.

The question is much simpler and much harder to avoid:

Are we thinking clearly enough to deserve the speed we now have — or are we just accelerating our mistakes?