November 07, 2016

Mutation Testing – Integrating into a large (legacy) application

This blog demonstrates the concept of mutation testing on real-world examples. It tries to show how it can be integrated into an existing large (legacy) code base and how it impacts the build time. Based on this information, I want to give you some ideas about how mutation testing can be introduced in your applications.


If you are already familiar with the theory of mutation testing, you can skip this section and move on to the "Environment" section.

What is Mutation Testing?

Mutation testing creates so-called "mutants" for your Java class that is unit-tested. A mutant is a slight change in your program code (e.g., an if-condition gets inverted), which should lead to failing tests. If a test after the code change fails, the mutant is killed. The goal is to kill all mutants. 
This page gives a short but good overview of the basic concept.

Why Mutation Testing?

Testcoverage is a useful tool for finding untested parts of a codebase,‘ as Martin Fowler puts it [TESCOV]. This meanspoor coverage numbers indicate worrisome holes in the safety net of a testsuite. However, full coverage alone testifies nothing about the quality of theunderlying tests! The only reasonable conclusion to draw is that there areobviously no uncovered spots.

Mutation testing tries to point out missing or weak tests. Generally speaking, if you change something in the code (what mutation testing does), there should be a test that breaks. If not, you can be sure that there are missing tests or the tests are of low quality.


The application has roughly 280,000 lines of code, around 5000 test classes, and uses Maven as build its framework. The test coverage is around 40%. A local build time (mvn clean install) is about 10–12 minutes. The notebook has an Intel Core i7-3687U with 16GB RAM running Windows 10 Pro.
I decided to use PIT as the framework in Version 1.1.10, as it seems to be one of the frameworks that includes Maven support and is actively developed, something I cannot say about Jester and Jumble. PIT also supports Gradle, which no other framework supports.

There are PIT plugins for diverse Java IDEs.

Integration with Maven

The integration with Maven is very simple. Just add the following lines into your root pom.xml if you want to enable mutation testing for all your modules.


Hands on

The Maven opts are configured as follows:

-Xms1024m -Xmx4096m -XX:MetaspaceSize=512M -XX:MaxMetaspaceSize=512M

Executing the Maven goal


> mvn org.pitest:pitest-maven:mutationCoverage

in the command line runs the mutation build. After 1 hour and 5 minutes, the build is finished. So, enabling mutation testing for a big project is quite uncomfortable.

Outcome of the build

Nice html reports are generated for the modules:

Fig. 1: HTML report for all classese of a package

Here, you can see the test coverage of each file. The mutation coverage shows how many mutations ewer created for each class and how many of them have been killed.
When you follow the link to one class file, you can see exactly which mutant was applied to a specific line and what mutants have been killed or survived at the specific line. 

There are many mutators in PIT. All settings are configurable – here we are using the defaults.

Fig. 2: Mutators applied to a specific line

In this example, we have the mutation outcome "NO_COVERAGE", meaning that there is no test to execute this line of code. Here is a set of possible outcomes of a mutation. The green lines are lines where every mutant has been killed.

There is a summary at the end of each file detailing the mutants killed and the ones that survived.

Improving the speed

As you can imagine, a build time of 1 hour and 5 minutes is not one that we can accept in our daily working life, so I want to give you some ideas about the options you have to reduce build times.

Skip specific modules

Sometimes it makes sense to skip specific modules, as they are not maintained anymore (e.g., a legacy module that is a blackbox for developers). Skipping these modules resulted in a mutation build of 50 minutes.

Set maximum mutations per class

Reducing the number to 5 shortens the build time by half. I would not recommend this option. The build is still much too long to add to the normal build, and you have to accept that fewer quality issues can be found.

Configure mutators

As is the case with the reduction of the mutations per class, I would not recommend reducing the number of mutators. I would prefer to raise the number of mutators when there is a high number of mutants to be killed in a specific module.

Run multithreaded

I tried to run the whole build with the threads option using two and four threads, but I could not realize any significant change regarding the performance of the build. The build was nearly identical to using a single thread. However, just play around to get your perfect configuration.

withHistory option

Depending on how much you change, this feature reduces the build time to a dramatic degree. Trying a slight program change resulted in a build time for this module of 1 min and 30 seconds, instead of 4 min 30 seconds. The other modules finished after only a few seconds because PIT detected no changes.


Only files that have a given status within the used VCS are considered. If there are no files within this status, then an info message will be logged to the console as follows:

[INFO] No modified files found - nothing to mutation test, analyseLastCommit=false

Integrating into an existing large (legacy) application

As you can see, it is quite hard and somewhat unpleasant to introduce mutation testing into the normal build process, as it is computationally expensive. However, it is a very useful tool for measuring the quality of your tests.

Earn developers’ commitment

First, if every developer writes new code or changes code, he or she should be responsible for writing tests that prove the changes/new code. It is possible to measure the quality of the written tests by executing mutation tests using the scmMutationCoverage option to gather quick results. Before checking in new code or changes, a specific number of mutants should be killed with tests in order to add only qualitative tests to the application.

Introduce sonar build with mutation test coverage

You should introduce a sonar build that executes a whole mutation test build. With this approach, you will be able to specify mutation thresholds to configure what percentage of mutants must be killed in order to get a successful build. This threshold can be specified for each module. I would recommend a high threshold for newly created modules (such as 90%) and a smaller one for existing modules. Use at least the current percentage of mutation coverage, so as not to lower the test quality. If everyone follows the instructions for mutation testing of new/changed code, you should be able to raise this value from time to time.


Try to set team objectives

It is really a lot of fun to kill mutants with a development team. It could be, for example, a sprint goal to achieve perhaps a mutation threshold of 80% for a specific module. With this sprint goal achieved, the mutation threshold of this module has to be raised to 80% to ensure its quality. This is a continuous way of improving the quality of your tests, and it will lead to fewer bugs in production.



Mutation testing is an extremely useful concept for measuring the quality of your unit tests and a much better indicator of “good” tests than line coverage. The computational effort makes it nearly impossible to add to a normal build within a large codebase. In this blog post, I wanted to give you some insights and ideas about how mutation testing can be integrated into an existing large codebase in practice. Although there are many options for reducing the mutation test time, be careful when using them, as some of them have influence on the number of poor tests that can be found.

The ramp up time to begin with mutation testing using PIT was very low, and it has a lot of plugins for the main Java development IDEs and tools, so I can recommend it. If you have any questions or opinions on this blog, please leave a comment!

Useful lessons learned

There have been some things that I learned during trying PIT for our application that I want to share with you.

Modules without java classes to mutate

If a module does not have classes that match the filter defined in "targetClasses" a build error is thrown. 

[ERROR] Failed to execute goal org.pitest:pitest-maven:1.1.10:mutationCoverage (default-cli) on project db-migration: Execution default-cli of goal org.pitest:pitest-maven:1.1.10:mutationCoverage failed: No mutations found. This probably means there is an issue with either the supplied classpath or filters.

In our application we use Liquibase as a database versioning framework that does not has any java file at all. There is an option failWhenNoMutations. Some of you may think that this should be the default, but it makes sense to fail the build, as this error indicates a problem with your filter. Having no classes mutated means you test without mutants what makes mutation testing senseless and therefore it breaks the build. I would not recommend to use this option. If you have modules without java classes just exclude the whole module.

Compile before mutation

When using PIT you have to be careful that all java files are compiled before running PIT as PIT is manipulating the byte code directly in order to create mutants. So a combination with mavens compile goal could be a good idea:

mvn compile -DwithHistory=true org.pitest:pitest-maven:mutationCoverage

Test not in same module as the mutated class

Currently PIT requires the test to be in the same module as the mutated class. Test classes that would possibly kill mutants but are placed into another module are not considered by PIT. This would be a useful feature that is shortly discussed here but I couldn’t find as a reported issue, so I reported one on GitHub.

Integration tests

Do not use mutation testing for integration tests as you can e.g. mess up your database!

1 comment:

  1. Transformation testing is a to a great degree valuable idea for measuring the nature of your unit tests and a vastly improved pointer of "good" tests than line scope. The computational exertion makes it almost difficult to add to a typical form inside an expansive codebase. In this blog entry, I needed to give you a few bits of knowledge and thoughts regarding how change testing can be coordinated into a current expansive codebase practically speaking. Despite the fact that there are numerous choices for diminishing the transformation test time, be watchful when utilizing them, as some of them have impact on the quantity of poor tests that can be found. Buy Dissertations Online