Ever run into those impossible-to-repro bugs that fester in the "monitor" category on your bug tracker? Or gone live only to have a bunch of gnarly non-domain bug bite you in the bum?
I'm increasingly interested in finding ways of catching these blighters. I'm currently re-purposing functional test automation to a) hammer away at the same test literally thousands of times in succession, and b) roll the dice, pick a test, execute it, repeat - in both cases until I get a crash or otherwise unexpected outcome.
In my current AUT, I have in the region of 100 transactions to test. I've just completed 10,000 iterations of the first one, and am finding a number of very interesting hard-to-repro (short of running the framework for another few hours/days that is) bugs. In addition, it is yielding some useful data with regards the probability of failure per transaction-execution and per unit time.
Anyone else ever tried a similar approach? I'd be interested in comparing notes...
I meant to reply sooner, but never had the time :-)
This is something I had to struggle for years with the product I used to work with. With a very generous set of system requirements, we had to cater for most of the PCs, if not all, in the market. The diversity of configurations and the environments lead to many such defects. However, my environment was Client/Server but in your case it seems to be database. I guess still we face the same issue.
I would categorize them in 3 main areas.
1. Sequence dependent - The defect appears only when you perform a task in a specific sequence.
2. Data dependent - Sensitive to the data used
3. Environment and timing dependent - Depend on the timing of the task and other unrelated events
In case of first two types, if it happens in a documented test or in an automated test it is easy to reproduce. But most of the times, it happens when you do something else so not easy to remember every step and inputs used. Once you fond the cause it is very simple - something like using the file from a specific location - but to come to that through hundreds of other possibilities is not fun! In these cases, we used brainstorming to find out possibilities as well as trying to re-think what we would have done to get that result. Sometimes, if the same person starts a test with the same initial conditions lead to the same defect, since usually people follow the same path for familiar tasks. We used to write a quick automation scrips and tried to bombard the application with different sequences and data values. All these approaches were effective for us.
The hardest was when things fail with environment conditions - such as timing differences introduced by another process. In this case, automation tests may fail once in a while but not often enough to identify the reason. If this happens during manual tests, we try to get a relevant automated test and loop it just like you have done. If it is not successful, use some memory/CPU load tools increase the load to change the timing. Usually we run a debugger in the machine so that developers can drill in to the system to find the cause.
By the way, I am surprised to find that this is an uncommon situation (there would have been more responses if it is common) - I thought this is a nightmare for every tester!
The app we're testing is an enterprise web app. Whilst this testing has identified a few data dependent bugs (I am using random data and that has kicked out a few combinations we didn't dream up in ET) this is not the focus of the testing.
We're looking for bugs that are more along the lines of your 1st and 3rd categories, and we have identified a number of failures that look to be of this nature.
A by product (but far from my main objective) is some interesting data with regards probability of failure...had an interesting conversation with a tester-friend who sees such numbers as the primary reason (rather than bug finding) for reliability testing...but that's a whole different debate...
I have run "the same test thousands of times in succession" all the unit tests we testers created. I can only remember one regression bug (memory leaking) this way. Maybe our developers were so great at writing reliable code :) or perhaps my other tests covered other bugs faster. It really took quite some time to figure out which one out of hundreds of unit tests is causing the leak.
I prefer doing more focused reliability tests - attack one risk at a time. This way when something fails or goes wrong I already have a good idea for what could be the reason. In case you are interested about details here are some notes from one particular project. But it's project- and technology- specific.
However, I could imagine situation (a context) where doing exactly what you have done would be the best thing to do. For example if unit under tests is a system that hardly relay on 3rd party services, which are not 100% reliable (and there is a chance that seldom this service will provide us with a garbage data instead of expected response).
Interesting that you did something similar at the unit level. I'm doing this via the GUI as client-side issues are part of my test scope.
SIDEBAR: to my amazement am finding Selenium RC exceptionally reliable - until recently I did not believe that a GUI test tool would give me the stability I needed to run such tests.
And thanks for your blog post, interesting read - I'm looking fault injection under a different test level - though I tend to see these more as a test of how gracefully the app handles depletion of particular resources - whereas this testing is helping us to identify issues that could lead to such resources being depleted etc.
The big challenge I am finding with this type of testing is that - once a failure has been identified - the cause can be hard for the relevant folks to isolate. A potential by-product here is enhanced logging, enhanced monitoring etc.