Two gates decide most of your build quality

April 28, 2026 5 min read Product building

I merged the code myself. Build passed, tests green, the PR looked clean. I marked the ticket done and moved on to the next one.

A few days later I ran the production verification — a basic end-to-end test against live data — and discovered the pipeline had never produced a single real result.

Not “had a bug.” Not “was flaky under edge cases.” Had never worked.

The code was AI-generated. But the failure wasn’t the AI’s. I’d reviewed the PR. I could have caught it. I didn’t, because the output looked right and I was already thinking about the next ticket. That’s the human part of this story: it is very easy to stop reading carefully when the output is fluent and the pace is fast. We trust plausible signals — a green build, a clean diff, a ticket that moved — because slowing down feels like friction.

It isn’t friction. It’s the job.

The other gate I’d walked past

That story has a twin.

On a different project, I’d been managing a build board with dozens of tickets. Some had been marked Done in earlier passes. When I went back to strengthen the specs, I found tickets that said things like “create the base layout.” Three words. No file paths, no nav behavior, no font-handling expectations, no acceptance criteria. The builder had filled in every gap — sometimes well, sometimes expensively, always invisibly.

When those tickets got rewritten into real specs — with explicit file targets, constraints, scope boundaries, and binary completion criteria — the output changed. Not because the builder got smarter. Because the contract got honest.

Here’s what I’ve come to believe: quality isn’t an implementation problem. It’s a specification problem and a verification problem. Implementation is the middle — and it’s the part everyone obsesses over. Most quality failures I’ve seen trace back to one of two gates that someone walked past without stopping.

Diagram showing Spec and Verification gates flanking Execution

Gate one: the spec. Where ambiguity either surfaces or hides. A strong spec forces you to decide what you’re actually building before you start. A weak spec defers those decisions to the builder, who makes them silently, at speed, and often wrong.

Gate two: the verification. Where the output either touches reality or doesn’t. A smoke test — the simplest serious check that the core path produces a real result — is the cheapest professional safeguard available. Skip it, and “done” becomes a feeling.

Everything between those two gates is execution. Execution matters. But execution against a vague spec produces variance, and execution without verification produces false confidence.

The human problem AI makes worse

This isn’t really an AI article. It’s a human-nature article.

People have always been good at trusting plausible signals. A confident status update. A clean-looking deliverable. A teammate who says “yeah, it’s done.” We’ve always been willing to skip the boring verification step when the visible evidence feels good enough. AI didn’t create that instinct. It just feeds it faster.

When a builder can turn a thin ticket into plausible code in minutes, ambiguity gets amplified, not absorbed. If the spec didn’t lock the outcome, hidden decisions compound at speed. And when the output looks polished — good variable names, clean structure, sensible comments — it gets even harder to slow down and ask whether the thing actually works.

That’s what happened to me. The code looked right. I’m a competent reviewer, but engineering isn’t the strongest part of my background. I still missed it, because reading carefully feels like wasted time when the surface is convincing. Multiply that across a team and you get an organization that moves fast, feels productive, and is quietly building on assumptions nobody verified.

What strong gates look like

A strong spec answers the questions the builder would otherwise guess at. What problem are we solving? What shape should the answer take? What’s explicitly out of scope? How will anyone know when it’s done?

Practical test: two competent builders given the same spec should produce work in roughly the same shape. If the output would vary wildly depending on who interpreted it, the spec isn’t a contract — it’s a suggestion.

A strong smoke test answers four questions. Does the core path run against something real enough to fail the way a user would? Does the expected result actually appear? Would the check fail if the feature were broken in the way that matters? And is it cheap enough to run whenever the critical path changes?

Neither gate is bureaucracy. A good spec cuts clarification churn and makes review faster because you’re comparing output to a visible contract. A good smoke test catches breakage before anyone builds on top of bad assumptions. Skip the first gate and you pay in rework. Skip the second and you pay in false confidence — which is worse, because you don’t even know you’re paying.

Check your own board

Pull up your last five completed tickets. Read the spec on each one. Then check whether anyone verified the core result after deployment.

If more than two specs are one-liners and more than two verifications never happened, your build system has two open gates and the work is getting optimized for the wrong things.

I know because I’ve been there. The fix isn’t better code or smarter AI. It’s admitting that the human in the loop needs guardrails too.