Saturday, 4 February 2017

All our tests should always pass

Continuous Integration and automation enforce that our tests should be of a very high quality and always pass.  I don't think I always agree.  Tests that regularly pass could be concealing quality problems that could damage your product.

In any CI pipeline new builds are tested with a set of automated testing.  Tools such as chef or jenkins are very good at orchestrating this process.  However they do rely on a set of tests that are almost 'guaranteed' to run.  However if the same tests are always used and they always pass you must ask yourself the following questions.


  1. Are the tests doing what you think that they are.  Are they testing the code you think they are, are they just exercising the code and not testing it?  Beware tests that don't test what you think they do
  2. Are the tests and the newly delivered code in different areas of the product.  If your tests are regression testing one part of the product which is logically separated from the code you are delivering then you might not be testing as fully as possible.
  3. Beware the pesticide paradox.  If the same tests have been used for a long time then it is likely that the code has become 'immune' to the tests.  Areas of the code that are actively tested by the tests are likely to have become so well written over time that a developer is unlikely to introduce an error into this area.
  4. The tests have become stale.  The tests although they exercise the code that is being developed they may not do it in ways that you would expect your customers to use the product.  Maybe they use a legacy way of invoking the product or they use methods that have long fallen out of favour.
Tests that fall into any of the above categories might be lulling you into a false sense of security.  Sure they pass but do they do what you think that they do.   To help keep your pipeline of high quality I have the following recommendations:
  1. Rotate your CI tests.  At different stages of your testing you will use different tests.  Rotating the tests to different stages in the pipeline helps to drive the code differently at different stages of integration and could find some easy to spot defects
  2. Calculate a test efficiency rating by dividing the number of regressions the test has found by the number of times it has been executed.  Tests that are inefficient might be incorrectly testing the code or be testing an area so stabilised that it is unlikely to be regressed.  It might be worth only running this test occasionally
  3. Constantly add new tests into the pipeline.  Writing a new test for an existing part of the product can be a great learning exercise and pull out new defects.
In summary - although you need code to progress through your CI pipeline it is good and almost healthy to find regressions as a regular activity.  If your tests always pass don't become smug, they could be just not telling you that there is a problem

Beware tests that don't test what you think they do

Migrating tests from one framework to another we came across a test suite that taught us a valuable lesson, treat your automation with a level of distrust, unless you have valid reasons or proof that it is doing what you expect.

This test suite was seemingly perfect, it was reliable, ran against all the supported releases and had a run time of 5 minutes.  It's name was simply the name of the technology that it tested.  The only problem was that if it was going to be used as part of our CI pipeline then it would need to migrated to our new test framework.

Most of the components of the test were easily reusable, all that needed to be migrated was the code that provisioned the test system and invoked the test programs.  This took a few days of effort and I was left to review the new migrated test suite.  Everything looked fine.  The only issue that I had was that the test method names were a little un descriptive.  I spoke to the engineers and requested that they ran the test in a debug mode to understand exactly what the test did.  I didn't want them to do any static analysis of the source, but to see what the test did at runtime.

A day later I met with the engineers to see how they were progressing,  they were hitting problems.  As fair as they could see the test never invoked any of the APIs that they would expect given the name of the test suite.  I asked them to show me and I had to concur.  It did appear that the test didn't use the technology we were expecting.

I gave the engineers a list of diagnostics and further tests to double check this finding.  After this was done it was clear the test was simply not testing the function that we thought it did.

The problem with this test is clear.  There had been an assumption that the test 'did what it said on the cover' and that since it always passed it was considered a great test asset.  In fact this was probably the worst test we had.  It built a false level of confidence in the team and could have let regressions into the field.

Naturally we have deleted all evidence of this test suite from all source code repositories and test archives.  However even this action was not without complaints.  A lot of the team felt that even though it was clear the test didn't do what we expected it did do 'something' and so we should keep it. This is the wrong thing to do!

I agree that the test did execute some of the SuT function.  However the code it did execute it only exercised and did not test it.  If the test found a regression would it report the problem or not?

Because the test always passed it wasn't until we decided to migrate it that this problem was uncovered.  Tests that fail regularly (either due to a regression or a test failure) at least get eyeballs on them.  These checks implicitly validate that the test does something of valid use.

So what did we learn from this:

  • Code coverage is great at ensuring that a test is at least executing the code you think that it is.
  • If the test is executing the code you expect and it regularly passes it is worth checking that it is testing the code and not just exercising it.
  • We needed more tests in this area - we have now done this and ensured that they actually do what we expect.

Simple Algebra - solution

First - get rid of the images and substitute some characters that are easier to manipulate:
  1. a + b - c = 4
  2. a - 2b + 3c = -6
  3. 2a + 3b + c = 7

Add 1 + 3 together:
 a + b - c + 2a + 3b + c = 4 + 7

        4) 3a + 4b = 11

Multiply 1 by 3

3a + 3b -3c = 12
add equation 2

3a + 3b -3c +a -2b + 3c = 12 - 6
   
       5) 4a + b = 6

solve 4 & 5

3a + 4b = 11
4a +  b = 6

b = 6 - 4a

3a + 4(6 - 4a) = 11
3a + 24 - 16a = 11
-13a = -13
a = 1

3a + 4b = 11
3 + 4b = 11
4b = 8
b = 2

Substitute into 3) to get c
2a + 3b + c = 7
2 + 6 + c = 7
8 + c = 7

c = -1

Sunday, 28 February 2016

Simple Algebra

Over the last few weeks I have seen a lot of post on Facebook like this:

From familyshare.com
These are just simple substitution algebra.  You are not a genius if you can solve it just have a basic grasp of mathematics.  I thought I would create one a little bit harder.  See below:


This is a little harder - can you solve it - solution soon

Sunday, 14 June 2015

A failure to do bva

Boundary value Analysis ...
... is a simple test technique.  It is taught on all introductory software testing courses.  The theory is that you split input data into sets of valid and invalid input and then test at the boundary between valid and invalid data.  Easy example, a function accepts an integer between 5 and 10.  Now ignoring the invalid sets as they approach the upper and lower limits of the integer data types you would use the following tests:
  • lower bound
    • 4 - invalid
    • 5 - valid
    • 6 - valid
  • upper bound
    • 9 - valid
    • 10 - valid
    • 11 - invalid
You obviously also test values in the middle of each set such as 3 or 7.  Assuming that the implementation of the function is approximate to the spec then this should be a reasonable set of tests to run.  Lets try a real world example:

A credit card company has a system that generates card security check (CSC) number (the last 3 numbers on the back of your card) and a system that checks that during a card not present transaction (like when you buy something online) the CSC is valid.  A CSC has the following properties:

  • 3 digits long
  • has a min value of 001
  • has a max value of 999
The test case of a CSC having the value of 000, is that valid or invalid? Not sure? well a test case you should try.

I recently got a new credit card to replace my older ones which were due to have a interest rate rise. The cards arrive and one has a CSC of 000.  I think no more about it apart from, wow that's going to be easy to remember.  Tonight my wife needed to make a purchase online.  Since it was for work I thought we would buy it on the credit card so that when she was reimbursed the money could be applied direct to the credit card.

To my concern the transaction was declined.  I checked online that there was ample credit for the purchase.  There was.  I called the company and asked why the card was declined.  Sanjay (the call taker) advised me that I had mistyped the card security number.  I mentioned that there was NO chance of that since it was so easy to remember (000).  He put me on hold.

Yes Mr Yates.  There is a problem.  The CSC 000 is considered invalid by our system.  As a security precaution we have cancelled all of your cards.  We are sending you new ones in the post.
Excuse me! I replied.  Why is that number considered invalid when it was one of your systems that generated it and printed it onto a card?  Surely this is a boundary value that would have been tested?
Sanjay was very apologetic and credited me £25 in way of an apology (so that's the purchase paid for) and allowed me to use my wifes card to complete the purchase before he canceled all of the cards to process the request for new cards to be issued.

We all make mistakes but there is a compound failure here.  Should the value 000 being fed into the card processing system really cause all the cards associated with an account to be blocked?  I'm not saying who the company is but I wonder if someone else tried a card not present transaction using a CSC of 000 would all their cards be blocked as well?  I'm sure it was never tested as the CSC generation system should never have issued a card with a CSC of 000

So what have we got:
  • 2 systems that have the same spec of what should be valid and invalid, however different implementations.  One system considers the edge case 000 valid and the other invalid. 
  • A system that doesn't recover from a card not present transaction having a CSC of 000.  Instead defaulting to the 'safest' behavior of blocking all cards associated with the card 
  • Potential opening for a test consultant? 
So I am bit annoyed and inconvenienced and I acknowledge that the chance of a card being issued with a CSC is  1/1000 but if a simple test case has been written this wouldn't have been an issue.
Also means I now have a great example when teaching boundary value analysis.

Friday, 24 April 2015

Difference between load and stress - using a metaphor

Load or stress testing a component are two different test techniques that often get confused.  Here is an analogy which I have modified from a conversation I had with James O'Grady

A load test is driving the car for 874 miles at an average speed of 60mph, in 5th gear, while using the air conditioning, cruise control and CD player.  Using lots of the capabilities of the car at expected limits for a length of time.  During and at the end of the journey we would expect the car to still be operational and all the dials on the dashboard to be reading nominal values.  A stress test is a completely different type of test.

In a stress test we want to push the system beyond its limits.  Often the limits will not be clear and so often the test becomes exploratory or iterative in nature as the tester is pushing the system toward the limits.  If we reuse the driving analogy we might start the same journey but now drive at 70mph in 3rd gear.  Initially we think this might be enough to stress the car.  After 60 minutes we increase the stress by removing some of the car oil and deflating the tyres.  Now some of the dashboard lights are showing us that some of the car components are stressed.  We then remove some of the coolant fluid and remove a spark plug.  Now the car is seriously under stress.  All the lights are on and eventually the car gracefully stops operating and we are forced to steer to the hard shoulder.  Once safe we re-fill all the fluid and oil, re-inflate the tyres and repair the spark plug.  Now we are able to restart the car and resume our journey driving properly.

A stress test is pushing the system beyond the limits it is designed to run at either by restricting resources or by increasing the workload (or often both).  This is done until the system either gracefully shuts down or restricts further input until it is now longer under stress conditions.

Both tests are heavily contextual as it relies on a deep understanding on how the software will be used in the wild.  Will a customer use the software for a long period of time under a load condition or do they just use it in short bursts.  This question is more important when you consider software built in the cloud.

If your software is built in the cloud and you are re-deploying every 2 weeks then your view of load and stress testing will be different to testing an on prem application as the operational realities of using that software are contextually different.


Wednesday, 1 April 2015

I am a runner ...

I am still running.  Had a few weeks off while I recovered from chesty coughs and had to repeat a few weeks to regain fitness after the cough BUT.

I am a runner!

I know as Laura (the 'voice' of the NHS Couch to 5K podcasts) said I was.  She gives you this reward at the end of the last run of week 6 (25 mins of continuous running)  and it brings such a wave of emotion.  6 (or in my case a few more) weeks of hard physical exercise and mental fortitude to keep going finally pays off.  I am a runner.  I can run.  

I still have my goal of a 30 min 5K to achieve and I know to do that I have to build more pace and more stamina into each run.  However I have another 9 runs to do that in and it all feels achievable.  

Earlier in the course there are certain runs that fill you with horror as the length of time spent running is cruely ramped up.  The 3 min run in week 3, The 5 min run in week 4 and the largest of them all the first 20 min run at the end of week 5.  But I have conquered them and there are no more left, just a gentle increase in duration till we hit 30 mins.  Feel a little like Frodo after throwing the ring into mount doom.  The main obstacles have been completed.  Sure it is still a long way home to the Shire.  But I've got this far, I can keep going to the end.

I have also found out that sharing my progress has inspired two others to start running.  One is about to start week 4 and the other week 1.  I never EVER thought that me doing exercise would inspire someone else.  I was always the one that needed the inspiration to do anything physical.  

I've lost weight, my belts are all on the last hole and I need new trousers.  My shirts fasten around the collar and suit jackets are no longer straining the buttons.  This feels terrific.

If I ever meet Laura I owe her a drink.  The podcasts have tangibly changed my life and made me healthier where gyms and diets have failed.

Just need to keep running and finish what I started and do a sub 30min 5 K run.  But this is no longer a pipe dream.  It is realistic, the hard work is over I just need to keep going