Dispelling Myths About Test-Driven Development

A while back I read a LinkedIn post where a Test-Driven Development (TDD) evangelist suggested something truly surprising and disagreeable to me. Not long after that, a client of mine sent a video of another TDD proponent, and asked my opinion. Surprisingly the proponent on the video seemed to have missed the point of TDD entirely.

These events got me to thinking about all of the myths and misconceptions that persist regarding TDD. This is my attempt to dispel the most damaging of those myths.

If you’re a TDD skeptic, this post is for you. Belittling folx for their beliefs is never productive; experience and information are the antidote to misconceptions. I hope you’ll take into account my roughly 6 years and 6 projects using TDD full-time, the prior 13 years without TDD (for comparison), and the 20+ years teaching & coaching software developers not merely in the mechanics, but in the benefits and the joy that TDD brings back to the act of writing quality software.

Like all myths, there’s always an element of truth, or a reason why the myth persists. I’ll do my best to share that for each of the myths, as well.

You might also notice a bit of repetition across this collection. If you want, you can read only those that you find compelling (and I still get to share with you my interlaced perspective).

I’ll start with one of the more pernicious myths.

“It’s Twice as Much Code, therefore Takes Twice as Long!”

Why it’s Almost True

In my experience, it’s actually about three times as much code! It will also slow you down until you get used to it, and until the safety net of tests is large enough to offer some protection against mistakes.

On every one of the XP teams I worked on, there came a time when we accomplished something amazing, and looked back and said “we are so glad we’ve been doing TDD all this time, otherwise that would have been nearly impossible.”

Why it’s False

There are two reasons why there is much more test code than implementation. First, test code tends to be rather “scripty”—that is, it’s written as a step-by-step scenario designed to be quickly read and understood by developers.

Second, the resulting object-oriented or functional implementation decomposes the solution (if not at first, then through diligent refactoring) and greatly reduces duplication.

Without having that safety net of thorough tests protecting existing code, the developers’ only other safe choice is to copy/paste/modify older functioning code in order to build a similar enhancement. That practice results in a great deal of duplication, and very brittle designs.

Does TDD take longer? Not really. Software development requires logical thinking and the practiced decomposition of business rules. TDD provides structure for that logical thinking, for both the immediate task and for future tasks that might affect today’s code.

Any unfamiliar practice will take more time and effort until the team starts to get used to it, and to experience its benefits. A team can take up to a month to become sufficiently comfortable with TDD, but they’ll still benefit from the tests and implementation they build during that month.

And in both the short term and the long term, each existing test is an investment in that behavior, allowing your team to make rapid changes in order to accommodate new features. People usually miss this longer-term benefit because they don’t stick to it for at least 4 to 6 months.

On every one of the XP teams I worked on, there came a time when we accomplished something amazing, and looked back and said “we are so glad we’ve been doing TDD all this time, otherwise that would have been nearly impossible.” Or extremely expensive.

“TDD is a Testing Practice”

Why it’s partly true

If done well, TDD does result in an extremely useful, fast, and comprehensive safety net of regression tests. It does not, however, replace all other forms of testing. At the very least, Exploratory Testing—actually using the software and experimenting with it—can never be replaced by automated tests.

Why it’s partly false

The reason it’s called Test-Driven Development is because it’s primarily a discipline for development; i.e., for writing code. Thinking in terms of expectations and outcomes is a very natural way to approach software development. I’ve often pointed out to developers, and sometimes even whole teams, that many of their conversations are expressions of concrete examples (i.e., scenarios…specifications…tests!). TDD merely formalizes that into the creation of repeatable tests that both (a) confirm that the initial implementation is correct, and (b) prevent that behavior from breakage in the future.

I’ll often summarize this point by saying “It isn’t a test until it passes. Before it passes, it’s a request you’re making of your code.” And it’s always an understandable specification for that behavior.

“It’s Okay to Delete the Tests”

The LinkedIn recommendation mentioned at the top of this article suggested that once you’re done building the software, you can delete the tests. This one really gets me fired up for a number of reasons.

Why it’s Truth-y

Sure, when you are done building the software you can delete the tests. You can also delete the code, and merely deploy the compiled binary! Then just sit back and watch the revenue roll in.

You’re done, after all, right? No one will ever need to go back and add, fix, or change anything. Right?

Why it’s False

Right?! Unless, of course, someone does request an enhancement, or a fix; or you discover that you haven’t met the needs of your customers.

I’ve never written—or even heard of—a piece of v1.0 software that was so successful it never needed to be changed.

The tests that cover existing behaviors are there to make certain those behaviors remain intact while you add related features.

Three pertinent reasons why this myth is…well, just nutty:

(1) A fact of fundamental software design: A good design is a changeable[/maintainable/extensible] design.

What I’ve learned during my time writing and changing code is that—on average—the code that is most important is also the code that changes the most. Churn occurs where innovations are implemented.

An example from my career: My 2003 team maintained mission-critical software that securitized CMBS loans. That was also effectively the entirety of the business, so if the software stopped, so did the revenue. And one day, leadership decided they wanted to expand into a different type of loan. What had to change? Nearly everything! The business rules for CMBS loan securitization had subtly permeated everything from the HTML to the database schema. (Ahem…this was the case prior to my arrival. Our new team had our work cut out for us!)

So each day we covered with characterization tests the area that needed to change, then used TDD to add the new behaviors.

The tests that cover existing behaviors are there to make certain those behaviors remain intact while you add related features.

(2) TDD isn’t a testing practice.

It just ain’t. Oh, sure, if you do it well, you end up with a comprehensive safety net of tests that check tens of thousands of discrete behaviors in less than 15 minutes. (Why 15 minutes? “Coffee break!”)

But when you’re doing TDD, it’s very much a way of thinking, and of building. You say (to yourself or—better yet—to the person sitting next to you at the keyboard) “We need to change the code so that, given these states and inputs, we get this result.” You write a test (really, a specification) that expresses that idea, then you write the implementation that you know will make it happen. It isn’t really a test until it passes.

After it passes, it’s also still a specification! It describes the behavior to the next developer who has to understand, maintain, or extend the software. That might be you in a year, or it might be a whole new team after you become the CEO of your own multi-million dollar software company.

Why the bleep would you delete one?!

(3) Deleting tests is a waste of time and brainpower.

When TDD is done well and includes proper use of test doubles, a full suite of tests will run over a coffee break. In modern IDEs, they’re running in the “background” whenever you make a change. What harm in letting a safety net of comprehensive tests linger?

And if you’re planning to be selective, deleting only those tests that cover the “less important” parts of your system, how will you decide? How will you know you made the right choice? (You will know if you made the wrong choice. See later in this article about Log4J.)

“We Don’t Need to Test That…”

“…it’s too simple,” or “…it’s too hard,“ or my favorite, “…it’s not important code.”

Why it’s True-ish

Kent Beck once admonished us to “test everything that could possibly break.” So, you don’t have to test anything that your team cannot break. Ergo, you do not need to test libraries and frameworks that are not under your control. They might have defects, but you can’t fix those, because it’s not your code.

Also, testing simple getters and setters (aka accessors and mutators aka properties) is justifiably silly. They don’t do anything that isn’t checked by the compiler (if you are using a compiled and typed language).

Usually, though, where there’s an abundance of getters and setters, all the business logic related to that data is defined in static methods on a “Helper” class. Well, please be sure to test those static methods. And watch out for global data!

Aside: One of the fundamental ideas behind object-oriented design is that the data and the behavior that uses it live together in one place. It’s just easier to locate and maintain pertinent behaviors that way. It’s also much easier to test those behaviors. Interestingly, if you use TDD, you cannot write code that’s hard to test. You’re welcome.

Another case I’ve seen where it was difficult to justify TDD: You might be building a simple Create/Read/Update/Delete (CRUD) app using a framework that does all the UI and DB stuff for you. That is, it has a very thin or nonexistent layer of business logic requiring any testing. The one (and only) app I worked on using Ruby on Rails was like this. The requirements were so painfully simple: create, read, update, delete…repeat…YAWN! Thankfully we at least used Cucumber scenarios to make sure Rails was giving us the results we expected. So, yeah, I guess we were doing Behavior Driven Development (BDD) which—like TDD—is a “test-first” practice.

…the amount of time it takes to write that test is less than the amount of time to figure out if that behavior is something that “could possibly break” or not.

Why it’s False

TDD is about crafting, designing, and testing behaviors, not merely code.

I’m often asked, “Do we have to test everything?” and my answer is “You want to test everything!”

Okay, so that makes me sound like a dogmatic absolutist. So let me “unpack” my reply.

(1) There is no such thing as unimportant code.

Do you have to test “unimportant stuff, like logging?”

I recall coaching a team who assigned the “unimportant stuff (like logging)” to the “junior” developers. So I asked the team, “Is it okay if that code breaks in production?”

The behavior your team writes needs to be tested. Logging is a great example. Are you logging because you’ve not been writing tests and have a whole lists of defects you’re trying to locate? Or are you logging because it provides crucial trouble-shooting information when your software is live? If you want your software to log things, then that’s part of your software’s behaviors, and you should indeed test that.

Those many [high-tech Silicon Valley] teams who didn’t test their “unimportant” calls to the myriad static methods provided by Log4J recently got bitten in the arse…twice! The first nearly-fatal bite was because Log4J harbored a very nasty back-door that allowed nefarious hackers to inject their own executable Java code into your software. For decades.

The second bite was that, without having a simple delegating wrapper around those static Log4J calls (which disciplined TDD and/or diligent refactoring would have encouraged), those teams had to replace the logger in every line of code where it’s called. Had they built a simple delegating wrapper around Log4J calls, those teams could have swapped out one simple class, run their test suite during a coffee break, and delivered a security patch in less than a day.

(2) Trying to decide up-front what needs testing and what doesn’t is a waste of time and brainpower.

Those who belabor over, discuss, or debate whether a behavior needs to be tested are wasting time. Once a developer becomes even mildly proficient at TDD, the amount of time it takes to write that test is less than the amount of time to figure out if that behavior is something that “could possibly break” or not.

Rather than wasting precious neural calories on predicting the future—and then gambling the success of the whole product on that prediction—you simply write the test. When a developer does TDD well, the whole Test-Driven cycle (red-green-clean) takes less than 5 minutes, and is usually much shorter than that.

“It’s Silly to Test a Method or Line of Code!”

Very true

So don’t. Developers occasionally ask me, “Should I test my getters and setters?” and I reply with “Only if they do something useful.”

But no one suggested otherwise

TDD is a unit-testing practice, but we don’t define a “unit” as a class, method, block, or line of code. What then?!

We test the smallest “units” of behavior. In the good old days when I taught Essential Test-Driven Development in person (and I hope to do so again soon!), I would pick up an empty box and throw a blue pen into it. For this article, I’ll use simple pseudocode…

box.add(bluePen)

And I would ask, “How can I test that? How do I know the blue pen is in the box?”

assert(box.contains(bluePen))

So, the whole test is:

box.add(bluePen)

assert(box.contains(bluePen))

Am I testing add() or contains()? Those are methods, not units of behavior. I’m testing that what is added is stored in the box. That’s a behavior.

So, a behavior can span methods (and sometimes even classes, but that’s part of a later myth-related discussion), and a method can contain multiple behaviors.

An example of the latter can be seen in the following error-checking implementation:

if (requestedVolume > MAX_VOLUME or requestedVolume < OFF)

throw “Sorry, you cannot set volume to $1”.substituteWith(requestedVolume)

volume = requestedVolume

There are three behaviors in that block of pseudocode, each deserving its own test. I usually suggest four tests, for clarity. Can you name them?

“You write all the tests first!”

The kernel of truth

Indeed, TDD has you write the test first.

But…

You write one at a time. With both TDD and Behavior Driven Development, you write one test/spec/scenario; then write the minimal implementation needed to make that scenario pass (without breaking any others); run all the tests and make sure they all pass (modern tools make sure that only impacted tests are running all the time); spend some time refactoring away code smells you just created or that are going to make the next test harder to write; then—and only then—do we write another test!

On an experienced TDD team (writing engineering specs), that whole cycle is roughly 2 to 5 minutes. With BDD (driven by product scenarios), it could vary more: maybe 2 to 30 minutes.

If you’re thinking “yeah, but in the process of writing one test, I’m going to think of three other scenarios, and I don’t want to lose that!” Great. That’s why I recommend having a simple and short “to do” list nearby. Write down a reminder or example, but avoid writing more than one test against an unsettled interface(/contract/API). Keep your options open.

“TDD is about increasing code coverage.”

The truth-nugget

On an existing product that lacks sufficient test coverage, TDD done with discipline will indeed gradually increase code coverage.

The rest of the story…

But that isn’t ever the goal of TDD.

I recall, rather early in my TDD-training career, being called in to address a quality problem. Apparently the team had been asked to increase coverage by 10% every sprint. They had done this (and added new features) for 8 sprints—for them that was 8 months. And they had succeeded in covering 80% of their code!

Except that the defect rate was still just as high as it had been at the start of those 8 months.

So I sat down with the developers to see what was going on. They were insistent that they had done what was asked. It was quickly evident what had happened.

Remember this pseudocode test?

box.add(bluePen)

assert(box.contains(bluePen))

Well, what this team would have had was something closer to this:

box.add(bluePen)

And that’s it! No assertions. Hundreds of tests, zero assertions! ZERO!

“What does this test?” I asked.

“That the blue pen was added!” they insisted.

“How do you know it worked?” I said.

“Because it doesn’t throw an exception!”

Ugh. Not the best measure of quality, that!

What had happened? These were professional developers, not impetuous children. They had been placed into a pressure-cooker development environment long before I got there, and been asked to do something (add tests) that they didn’t know how to do (because good testing practices are rarely taught in university). So they did what they could to meet the explicit goal: “Increase code coverage!”

You get what you measure.

I’m not saying that coverage is a bad metric. It’s just a bad motivational metric. Healthy teams measure coverage to keep an eye on trends. With discipline, coverage will stay flat or rise over time. If it dips, someone accidentally checked in some code without tests. Your continuous-integration pipeline can (and should) be configured to notice this and roll back that commit.

Coverage is an informational metric indicating the health of the product and the team’s methodology. Like measuring a child’s temperature, you’re looking for illness, and you don’t scold the child for having a fever, or demand that she rid herself of the fever and get back to school.

This team did receive some good medicine: my TDD course, including some practice in adding characterization tests to existing untested code. I hope it helped. Not all teams send me updates.

“You’re not allowed to think about solutions! The algorithm is supposed to emerge.”

A whisper of truth

TDD isn’t a “turn your brain off” practice. Instead it allows your intelligence, creativity, and experience to flow with fewer impediments. You don’t have to keep as many details floating around in your head because, over time, you’ve recorded those in previous tests.

In that way, TDD does facilitate emergence. Emergence comes from simple rules & practices (like TDD), repeated consistently at all levels of a system, fueled by some form of raw energy (in this case, your thoughts and creativity).

But seriously…

TDD is not a magical practice that makes previously unknown algorithms emerge. You’re not going to discover a new encryption algorithm, reinvent Dijkstra’s Algorithm, or find a new way to solve a Sudoku puzzle. You have to already know what algorithm you want to use, or how you would solve a Sudoku puzzle, in order to incrementally write the appropriate tests and implementation.

TDD is a cooperative win-win game you play with your code: You have an idea where you want to go with the solution, and you teach the code incrementally with each test. While you’re adding more complex scenarios incrementally, the previous tests prevent existing behaviors from backsliding.

What does emerge through TDD and refactoring is a software design that clearly supports known domain behaviors, and doesn’t get in the way of future enhancements. E.g., you might uncover a useful business-domain-specific abstraction that you hadn’t thought of up-front. The tests allow the internal software design to emerge through refactoring.

“You have to do TDD top-down, so you have to mock all the things!”

Why smart people have said this:

There’s a style of TDD called “London Style” which—if I understand correctly—does suggest starting from the user-interface and working your way “down” or “inward.” This would help avoid creating any code that you don’t really need.

Indeed, if I build A and I defer something by delegating to B, and B doesn’t exist yet, then I need to mock, stub, or fake B (that is, create a “test double” for B) until it’s time to build B. The idea here, which exists in both London-style TDD and also BDD, is that you’ll do a better job of designing B’s interface if you first discover what A needs from B.

Any behavior that was previously fully tested can be used with confidence in further tests and implementation.

In my experience…

I’ve tried this approach, and talked to other TDD/BDD coaches who have tried it. They all told me, in terms that were too polite to have impact, that it was a lot of extra effort. Mostly, it’s a whole lot of extra test doubles to manage. I prefer to use test doubles mostly for external dependencies.

The style of TDD I grew up with, sometimes called “Detroit Style,” is less rigorous. It can be summarized by an old Ward Cunningham quote: “Code what you know.” And the corollary: Don’t code what you don’t know.

If you follow the Single Responsibility Principle diligently (which TDD tends to help enforce), you will have many domain objects that can be used as their own “test double” in a sense. For example, Java has a Date object, and C# has DateTime, and those are easy to create with specific data for various scenarios. There’s no reason to mock or fake a Date. The same can be true for most of your domain objects. Make them easy to create without user input or access to a database. In the systems that we built, there was never a reason to create a test double for an Account, a Loan, a User…

So, if I’m writing tests and implementation for A, and I discover a need for B, I can set A aside after a “soft landing”: The whole system compiles and all tests pass, and what remains to be done on A is on my short-term to-do list. Then I go build B test-driven, then come back to A. I then test A using an actual B.

That might sound awkward, but using a test double for B isn’t really any less awkward.

It might sound like bottom-up. If A calls B and B calls C and so on, don’t I have to start at the bottom, whether that’s five layers or twelve? All I can say is that’s never been an issue. Even on systems with numerous complex data values and business rules, I don’t recall a dependency graph that couldn’t be easily truncated after two or three layers. I think the reason is that software doesn’t really result in actual “layers.” Instead, there are specific interactions between objects with different areas of responsibility (e.g., business logic, persistence, interacting with the user, and so on). On our XP teams, “layers” also emerged. But if I can only test A by building A->B->C->Database then I’m probably not testing a unit of behavior.

Using a real B to test A might also sound like unit-testing blasphemy. If I use a real B in the tests for A, aren’t I now re-testing B? Sorta, but so what? As long as I can create B so that its behavior is deterministic, and already tested by its own unit-tests, then why not use it in the test for A? You wouldn’t question creating a system Date object, would you? Are you testing Date or DateTime when you use one for your test of your domain object?

Any behavior that was previously fully tested can be used with confidence in further tests and implementation.

Did I miss your favorite?

Let me know. I’ve enabled comments for this article, but I’d prefer to receive feedback on the associated LinkedIn post. Thanks!

Previous
Previous

There Be Agile Dragons! (15 years later)

Next
Next

Good Software Design - Part 4: The Role of Tests