Mutation Testing
or who is going to test your tests ?
1st Developers@CERN Forum
Created by Sebastian Witowski
Do you write code ?
Let me just ask you a few questions before we start.
Please raise your hands if you write code.
Cool, everyone. Well, I didn't expect anything less since the forum is called "Developers@CERN"
Do you test your code ?
Now, keep your hands up if you write tests for your code.
Do you test your tests ?
Great, but let's take it one step further.
Keep your hands up if you test your tests ?
Do you test the tests for your tests ?
And the last one, keep you hand up if you test the tests that you wrote for your tests ?
No, I'm just kidding.
Let's just stick to testing your tests here.
Testing tests ?
"Testing your tests" might sound crazy.
After all, we just saw that not everybody has time to test their code in the first place.
Richard Lipton, Fault Diagnosis of Computer Programs , 1971
But what if there was an automatic tool for testing you tests ?
Well, that's a different story.
The same thought had Robert Lipton, when he first came up with the idea of mutation testing in 1971.
How does this mutation testing works ?
Ok, so how does this mutation testing works ?
Step 1. Change you code in a small way:
a + b ---> a * b
a + 1 ---> a + 2
a + b ---> a + c
a + 1 ---> a + 1 a + 1
(if a == 1 and b > 1) ---> (if a == 1 or b > 1)
Changes similar to small, programming errors.
You start by changing your code in a small way.
How small ?
Well, very small, for example you can:
change the operator or a value in a mathematical equation,
replace one variable with another,
duplicate a statement,
change the boolean operators,
etc., etc.
So the changes should be similar to small programming errors that each of us do.
Step 2. Run your tests
After that, you run your tests and see what happens. There are two possibilities here.
You start getting errors, which means that tests have reached the modified code and:
- either they died with some error message, because they didn't expect this modification in the code. For example your function was expecting 2 parameters but suddenly it got only one and that kills your test. This situation is called "weak mutation testing" and it means that your tests are good and in case something changes, you will notice it.
- or your tests detects the change properly and informs you about it with the assertion failure - which means that your tests are even better. This situation is called strong mutation testing and it's more powerful - it means that tests are actually catching the possible problems, but it's also more difficult to achieve this.
Step 2. Run your tests
But what happens if after the mutation, your tests are still working ?
Well, it usually means bad things:
- either you tests didn't detect the problems that were introduced during the mutation (so your tests are bad),
- or the code that was changed was never executed (so you have dead code, which is even worse than bad tests)
Step 3. Get the mutation score
number or mutants killed
number of mutants created
After all the possible mutations have been tested, you will get the mutation score, which is equal to the number of mutants killed divided by the total number of mutants created.
So here, instead of having 100% of test coverage (as for normal tests), we aim to have 100% of mutants killed.
Step 4. Profit
And that's it, it's that simple !
Now you know what tests you need to fix.
Example time
Let see an example of how it works.
def multiply(a, b):
return a * b
Let's write a function that could be used in a simple calculator program.
def multiply(a, b):
return a * b
class CalculatorTest(TestCase):
def test_multiply(self):
self.assertEqual(multiply(2, 2), 4)
And then write the simplest possible test for that.
Now, who can tell me, what is wrong with this test ?
Right, you can replace the multiplication operator with the addition operator (or even with the power operator) and this test will still pass. If you run a mutation testing tool, it will try to replace multiplication operators with various other operators or statements.
If the tool is good, it should detect the mutant that survived and inform you about that.
self.assertEqual(multiply(2, 2), 4)
↓
self.assertEqual(multiply(3, 3), 9)
So now we know that we need to improve our test.
And this is how mutations can help you writing better tests.
So where is the catch ?
Well, I'm glad you have asked !
Equivalent Mutant Problem
One of the main problems with mutation testing is called "Equivalent Mutant Problem" - it means that some operations can create a mutant that you won't be able to detect with any tests.
index = 0
while True:
do_stuff()
index = index + 1
if index == 10:
break
if index == 10:
vs.
if index >= 10:
Let me show you an example. Take a look at this simple loop that does some stuff and then increments the index.
When we reach 10, we want to break.
Now, the mutation tests can replace the equal operator with greater or equal.
We can easily tell that even though the syntax is different, both versions of this loop will have exactly the same behavior.
However, automatic tools are not as clever as we are. They will see a different syntax that has not been detected by any of the tests and give us a false positive.
Wrap up
The good parts
Detect problems with your tests
They discover dead code
They are automatic
How else would you test your tests ?
(Semi-)Automatic tool for testing ? I'm in !
What could be the advantages of automatic mutation testing tools ?
They can help you detect problem with your tests
But they can also help you detect problems with your code, like a dead code
They are automatic. You won't write the mutation tests by hand because it would take too much time.
You will either use an existing library or at least you will write your own library.
And let's face it - how else would you measure the quality of your tests ?
We have test coverage statistics for the code, but we don't have anything like that for tests.
So, if there was a good tool, that I could plug into my continuous integration cycle that would report problems with my test, then I'm totally sold.
Studies shows that software developers spend up to 50% of their time just on testing. I would love to have a tool that gives me automatic feedback on my tests.
The not so good parts
Mutation testing is slow: (TIME = ALL MUTANTS x ALL TESTS)
Handful of libraries
Equivalent Mutant Problem
Writing complex mutant tests is difficult
And that bring us to the "not so good parts".
First of all - mutation testing is slow. If you think that your whole test suite is slow because it runs for 2 minutes, then imagine that proper mutation testing requires running all your tests for the all the mutants. Well, there are some studies on how to speed up this process, like using selective mutations or mutant sampling, but they all boil down to running less tests and that might overlook some bugs.
There are not so many good libraries for mutation testing.
This is often caused by the aforementioned Equivalent Mutant Problem, which is very difficult to solve.
Also, writing complex tests, beyond the simple operator changing or variables replacement, it's basically impossible.
Mutant testing libraries
Mutant - Ruby (last updated September 2015)
VisualMutator - C# (last updated September 2015)
Pitest - Java (last updated August 2015)
Humbug - PHP (last updated May 2015)
MuCheck - Haskell (last updated January 2015)
MutPy - Python3 (last updated January 2014)
Mutator - commercial solution for Java, Ruby, JavaScript and PHP
As I said, there aren't that many libraries.
I'm showing this list in case you want take a look if there is a tool for your favorite language.
I might be missing some libraries, so I encourage you to search on your own if you are interested in the subject.
Example
MutPy (requires Python3)
Before I finish my presentation, I would like to show you a real life mutation testing tool run on the calculator program that I showed you before.
I will use the MutPy library that unfortunately works only with Python3.
calculator.py
def multiply(a, b):
return a * b
test_calculator.py
from unittest import TestCase
from calculator import multiply
class CalculatorTest(TestCase):
def test_multiply(self):
self.assertEqual(multiply(2, 2), 4)
This is how the code looks like. Just one function inside calculator.py and one test inside test_calculator.py file.
Let's run MutPy on those two files.
As you can see, pointed by those big red arrows, MutPy has created 3 mutants, each of them with a different arithmetical operator.
The first two mutants were killed, but the last one - the power operator mutant - has survived
Our mutation score is 66%.
self.assertEqual(multiply(2, 2), 4)
↓
self.assertEqual(multiply(3, 3), 9)
Let's fix our test and run the MutPy again
Great, now all the mutants are dead and our mutation score is 100%.
We have awesome tests and we can start adding new functionality to our calculator.
The future ?
So what's the state of mutation testing tools ?
Right now, there are no perfect tools for that.
One of the main problems is the high computational cost, but I think that thanks to the tools like docker that gives you cheap and scalable containers where you can run your tests in parallel, in the future someone will come up with a good solution and the mutation testing will be a standard feature of continuous integration tools like Travis of Jenkins.
Thank you !
Any questions ?
Happy coding testing !
This presentation is available on github , so you can see the slides on github pages
Thank you, I hope you guys enjoyed the talk and let me know if you have any questions.