A Revival for Test First, If We Don't Waste It

Using a coding assistant might turn out to be the best thing that has happened to Test-Driven Development (TDD) so far. This is the seventh post in my series about coding with an assistant.

For more than twenty years the objection to TDD was always some version of the same thing. No time. Too much typing. We will add the tests later. The assistant removes that objection. It types the test for you in seconds and it does not get bored doing it. On top of that, the path of least resistance with an assistant now runs straight through the test. I made that case in Fitness Functions for an AI Coding Assistant. To get useful work out of the assistant you have to tell it what done looks like, and a description of done that a machine can check is a test. TDD stops being the disciplined choice and becomes the efficient one.

A small case from the other day. We have our own frontend abstraction in TypeScript for calling the backend. It adds the authentication and context every call needs. A colleague had written a plain fetch to download a file. It added those details correctly, but it went around our convention of taking the call mechanism from a context, and that convention is what lets us drop in a mock and test the component. The plain fetch could not be tested that way. Before, I would have mumbled, filed a tech debt issue, and then taken the same shortcut myself to keep moving. This time I asked the assistant to file the issue, add the missing API, and use it in the component I was working on. The component went back to being testable, and doing it right stopped being the expensive option.

I want to believe a revival of test first is coming. I am not sure it is. This post is me being pessimistic about developers, myself included, and hoping I turn out to be wrong.

A show of hands

Earlier this year I was on stage at GeeCON, in front of a big room, two or three hundred developers. At some point I asked how many of them practised TDD. Fewer than twenty hands went up.

My friend Code Cop was in the audience, and afterwards we agreed we had both been surprised. It was not always like this. GeeCON once ran a whole edition devoted to the practice, GeeCON TDD, in Poznań in 2015, where I spoke myself. Back then a room like that would have answered the question with a forest of hands. Since then the skill did not just fail to spread, it receded.

That is the part that worries me. The revival I described needs developers who still know what a good test looks like. The show of hands says a lot of them stopped, or never started.

To be fair, plenty of those teams do have unit tests. Having tests is not the same as doing TDD. The tests are there because a quality assurance program can measure coverage with almost no effort, and a number that is easy to measure turns into a target. So the tests get added afterwards, as a chore, against code that already exists. It is a waste of time and energy, writing a test to confirm an implementation that is already sitting there for you to read.

Cheap tests cut both ways

The assistant will write a test before the code if you ask it to. It will just as happily write the test afterwards, looking at the implementation it produced and generating a test that confirms the implementation does what the implementation does. The build goes green. The coverage number climbs. None of it means anything, because the test was shaped to fit the code instead of the code being shaped to fit the test. It is a mirror, not a check.

This is the trap I expect most teams to fall into, because it is the lazy path and it produces all the right signals. Green dashboard, high coverage, fast delivery. You did the easy thing and it looks exactly like the disciplined thing. You find out the difference when the software meets a customer and does the wrong thing with great confidence.

None of this is new. Years ago I worked on a project where the coverage number was something the project manager reported to the customer. One tester wrote tests purely to push that number up. They exercised the code but had no assertions that could fail on a bad result, so they proved nothing. The coverage climbed, the project manager was happy, and the customer saw a healthy figure. When I deleted those tests, with a note that they did not help us or prove anything, the tester was upset, close to angry. The number meant something to him that the tests themselves did not.

That took a person sitting down and deliberately writing hollow tests. The assistant will produce the same thing in seconds, without meaning to and without anyone deciding to cheat. Gaming coverage used to take effort. Now it is the default outcome of the lazy path.

We outsourced the typing, not the thinking

In the fitness functions post I wrote that I handed the typing to the assistant and kept the thinking for myself. That split is the whole point. The thinking, deciding what the software should do, is the part that was always hard and always worth something. The typing was never the bottleneck, we just told ourselves it was. The change shows up in how many tests I write. Working with the assistant I add more of them, and in far less time. The thinking behind each one takes as long as it ever did. That part did not get faster.

My fear is that a lot of people will hand over the thinking as well, without noticing they did it. The assistant proposes what done looks like, you nod, it writes code and a test that agree with each other, and you ship. At no point did you decide anything. The tests are green and you are not in control of what they assert.

Test first is what saves you

There is one cheap defence against the mirror, and it is the oldest rule in the book. Write the test first. A test written before the code exists cannot be a copy of the code, because there is no code yet to copy. It has to come from your idea of what the software should do. That forces the thinking back into your hands at the one moment it counts.

There is a deeper version of the same idea. You solve the problem first, then write code that produces the answer faster and every time. The code does not find the solution, it automates one you already have. If you do not have a solution to the problem, there is nothing to automate and nothing for the assistant to write. The test is where you state that solution, in terms of the result you expect, and you cannot write it until you have worked the problem out.

None of this is new advice. Kent Beck put test-first in his first Extreme Programming book in 1999 and gave it the name in Test-Driven Development: By Example a few years later, but people were writing programs this way back in the fifties, they just did not say TDD. Working the problem out before you write the code has always been the sensible thing to do. What was not always there is the safety net. A set of unit tests you can run in seconds, over and over, was not always possible, and that is one of the details Kent helped us remember and understand.

It is the same red-green-refactor I have written about for years. What is new is that it is now the only thing standing between a useful test and a hollow one, and it costs almost nothing to do. The assistant writes the test from your description, you read it, you watch it fail, then you let the assistant make it pass. The discipline that is left is small. Decide what done looks like, own the test that says so, write it before the code. That is the whole job, and I am not confident we will do it.

BDD is the same thing, with one difference that matters

For me there is no real distinction between TDD and BDD, with one exception. A scenario written in Gherkin can be read by anyone who knows the domain, without reading a line of source code. That makes it a tool for communication, not only for testing, and that has not stopped being important. If anything it matters more now. The lazy path with an assistant skips the conversation about what to build. A scenario in plain domain language forces that conversation to happen, and it is the cheapest way to keep a human, ideally more than one, in the loop on what the software is actually for.

Aslak Hellesøy made this point back in 2014 in The world's most misunderstood collaboration tool. Gherkin was never mainly about automation. It was about getting people to agree on what to build before building it. The assistant does not change that. It raises the stakes, because it will fill any silence in the spec with a guess.

I hope I am wrong

Code Cop is not optimistic about any of this. He wrote a post called Accepting GenAI where he is honest about being frightened. Coding is his identity, he has been doing it for forty years, and he can feel it slipping. He said something to me once that stuck. It takes the fun out of programming. I understand him. There is real joy in writing the code yourself, and handing that to a machine costs something.

My worry is narrower than his, and maybe more boring. I am not afraid the tool is too good. I am afraid we are too lazy to use it well. We were handed a safety net that is finally cheap to make, at the moment when most of us had already taken ours down and walked off. The room at GeeCON is what that looks like.

I would like to be wrong about this. In a few years I would like to stand in that kind of room, ask the same question, and watch most of the hands go up. I am not counting on it.

Resources

Fitness Functions for an AI Coding Assistant - the previous post, on turning rules into constraints the assistant cannot argue past
Testing in 2026: more relevant than ever - why the tests still matter when the assistant types
TDD with an AI assistant - the first post in this series
The world's most misunderstood collaboration tool - Aslak Hellesøy, 2014, on Gherkin as communication
Accepting GenAI - Code Cop on being frightened by what AI does to the craft
Test-Driven Development: By Example - Kent Beck, the book that named the practice
Thomas Sundberg - the author