Software Testing Club -  An Online  Software Testing Community

I've noticed the article "Software Engineering Metrics" by Cem Kaner & Walter P. Bond has been mentioned a number of times on this site.

I was wondering if anyone has implemented what the article suggests to use, a qualitative analysis of each testing task, displayed in a table form with a column with each task and each tester has a score against them. If you have:
What have you found to be the benefits/drawbacks?
Is the table/info made available to management and/or all testers?
Do you still use quantitative test metrics as well? Which ones?

If you have read the article, but decided against implementing the suggestion an/or continued to use quantitative metrics:
How did you come to that decision?

Tags: metrics

Reply to This

Replies to This Discussion

Don't have a ton of time to respond, but, offhand, I'd suggest management by walking around and listening:


It doesn't appear as rigorous, but you can take extensive notes for a qualitative analysis.

Reply to This

If it's relating to the dashboard approach then yes, I have used it and yes, I made the information available to everyone. Interestingly I currently have an objective which states something like, "define a way to report coverage, consistently, across projects" and the aim suggests I need to provide a set of metrics to achieve this. I'm not, I'm going to provide a dashboard, with a summary of effort, depth of testing and smiley faces to denote quality. I'm not a fan of metrics, they often require too many caveats.

Reply to This

have you asked "... coverage of what?"

Reply to This

It's coverage as an indication of how much of the product we've tested, and how that looks versus what we've not tested so far. We're typically very good at showing how we're doing against our planned test cycle target, and what defects we've found, but what we're not so good at is showing how much of the product has been tested in a given time.

Reply to This

Thanks for the feedback - I really like the video (great speaker), I actually made some notes!
Similarly to Simon, our team have been asked to provide metrics on testing performance. I'm not keen on using the typical quantitative metrics so I'm trying to find other ways of doing it.
In the last couple of weeks though, I've asked a team member to use a dashboard for their testing (one James Bach designed). It looks like it's going to be very useful for the tester but also for the project team - it's an easy and great way to communicate progress and the perceived quality of the software.

Reply to This

Excellent paper that has had me thinking for weeks. The main point for me being the idea of whether the metric is actually related to what we think it is and if the measure is therefore measuring what we think it is.

One example that came up in a training course I recently delivered was 'number of bugs found (by customers) in live'. The assumption is this means 'the number of bugs missed by testing'.

Yet, almost straight away, the statement made was 'well, they're not always actually bugs'.

An additional takeaway was around taxonomies, mainly bug taxonomies that I use more and more these days.

I also developed another taxonomy called the 'Taxonomy of the Testing Domain'. Rather grand but it stemmed from me trying to place all the test levels, groups and types in some exclusive order, it needs finishing. It's survived a number of 'truth tests' but I'm sure it can be enhanced, critical contribution is so hard to find these days. The link is to the Club's Testing Book Wiki by the way.

Reply to This

"'number of bugs found (by customers) in live"

I saw a company essentially go out of business because when you installed the software on windows 98, it bluescreened and failed on reboot. In other words, it "hosed your box."

A single bug released in prod.

By a single-dimensional quantiative assessment, the software was "good", because very few bugs were found in production.

But that's 'cuz they pulled the plug very shortly after go-live.

In order to evaluate bugs in production, you'd need some sort of severity multiplier, which is subjective and qualitative. hmm ..

Reply to This

RAMBLY POST WARNING

I dunno, I kind of like "number of bugs found (by customers) in live," given the right context. I think it's a way to have qualitative discussions of projects and test effectiveness spurred by a simple metric.

I completely agree with Matt that it has problems, and he's identified one of them: severity can be important.

And there's separating field bug reports from things that are really more like questions, or setup/configuration/proper use issues.

And you'd need to make a ruling on something that a customer reports as a bug that's really an interpretation of a requirement (though there's a valid debate over whether that makes the report any less a bug).

And there might be an interest in making sure the bug found in production is counted against the correct release, which isn't always straightforward (e.g., if the shipping version is 8.6, but we don't know if a textbook bug just reported in production was directly related to code additions or changes in 8.6, how do we attribute it?).

Measurement dysfunction problems are a risk. You'd probably wind up putting a semi-controversial box around the pre- and post-release counts, and the box might never be perfect.

And for an environment with a more robust metrics program in place, this measurement might be too trivial to bother with.

But I like the metric anyway, in the context of an environment that in the early stages of metrics-gathering, because I think it's a low-cost metric that begs for context and nearly forces questions and discussion. It's simple feedback that almost demands further study, explanation, context and reflection, which can be a good place to start for a shop that's nervous about betting the farm on an admin-intensive new metrics program.

- It can be expressed as a ratio (e.g. Defect Removal Efficiency ratio), which makes it easy to set targets (e.g. 90% efficiency or better). Quantifiable targets give people something to shoot for, and results can be tracked over time (first week, month or quarter in production).

- It makes releases of different sizes easily comparable on one level.

- It does raise the question of severity, but mostly if the "escapees" are show-stoppingly bad, and the overall detection efficiency is not generally pretty poor. (If everything's going badly, the ratio will be the least of everyone's worries.)

- Over time it can encourage root cause analysis on the escapees: how might we catch that kind or class of defect next time, since we seem to get burned by it regularly?

- It can generate dialogue on the difference between releases: what characteristics made the detection efficiency noticeably improve or worsen between two different releases that were similar on the surface? Between two releases that were very different?

Things that are measured get attention paid to them. So for environments that don't already have a lot going in the way of quality metrics, I think this metric is relatively simple and has the kind of open-to-interpretation quality that can get people thinking and talking. It can help people notice patterns which can lead to the discovery of things really worth studying and tracking.

(I guess the implied environmental requirement here is having people who ask questions around these kinds of things, or who aren't afraid to explore the answers. Your mileage may vary....)

To conclude a waaaaay too long post, I think this simple metric can deliver some value.

Reply to This

Rick - I think if you are doing a one-time metric to understand what's going on, that's probably fine. When you take the metric on every project, every month, for evaluation purposes. That's going to introduce dysfunction.

So I guess I'm more in favor of metrics as part of an investigative process than as a defined procedure. hmm. That sounds familiar ...

Reply to This

Actually, that metric, taken every time, spawns discussion every time and is useful for plotting trends and focusing some attention on problem areas.

Not everyone that uses metrics is an idiot, Matt. Metrics are only one of many management tools and are useless unless you really analyze them. Why was X higher this time? Why was it lower? Why is X function always buggier than Y? For example, my group looks at every bug we "missed". We have no problems with figuring out which ones are "real" and which ones are questions, OR which were major and which were minor - it's right on the defect report. There have been times we've found consistent problems with some area of functionality - it's allowed us to plug those holes. Thanks to metrics reports, my company has successfully improved their unit testing efforts, lowered error rates across the board in production, and recently improved problems with late deliveries. That wasn't done by just looking at a number. That was done by looking at those numbers both individually and as a trend over time and by sitting down and talking about them; coming up with strategies to improve various issues and then continuing to monitor to see if those strategies made a difference.

The only type of metrics that are inherently evil are those that are used to measure human beings. Human beings are smart enough to figure out how to fudge the numbers and such measurements encourage unprofitable behaviors.

Managers generally don't manage by walking around and listening, unless you are going to evaluate performance based on what you overhear, which might be the latest football scores. And if you've got 42 people geographically dispersed, that's not possible regardless. Normally, your human resources department is going to have a pre-defined list of areas to evaluate and you, as the manager, have to find a way to measure those things with some level of objectivity. I WAS going to say that managers manage by asking the right questions at the right times, but actually, there's so much more to management than that, it would be a disservice to suggest it. It is important to know what to ask, when, and how to ask, however.

But back to topic, overall, I think fear of metrics is self-defeating and downright foolish if you're a manager. The key, from my perspective, is to choose (and help executive management choose) intelligently what measurements will be useful and to continue to monitor said usefulness; tossing away anything that doesn't provide valuable information to (someone).

In my organization, metrics always spawn discussions and I have yet to see them used in order to either punish (or reward) individuals. They're a tool used to help us strategize issues and work on better ways to serve each other.

- Linda

Reply to This

It is true that not everyone that uses metrics is an idiot. Yet not everyone uses them as wisely as your organization does according to your report Even organizations that measure thoroughly and conscientiously make mistakes, for exactly the same reasons that bugs exist: it can be easy to overlook something, especially when there are disconnects between departments and nodes on the org chart. I suspect that it is this kind of organization that concerns Matt.

For example, one of my newer clients has been collecting data on the ratio of bugs found in production, versus the number of bugs found in release. They don't just track this information in the first few weeks; they track it for a full six months after the product has shipped. Sounds reasonable—until you notice that their enterprise-level application is sold into big organizations. Big companies are often slow to change. In this case, they don't even consider testing the new version of the application until it's been shipping for a year, and don't begin serious deployment until six months after that.

I think the key to the success you report is right here: my group looks at every bug we "missed". We have no problems with figuring out which ones are "real" and which ones are questions, OR which were major and which were minor - it's right on the defect report. Every bug has a story associated with it, and when you look at the story for each bug, you avoid the informational loss that is a frequent byproduct of aggregation. It might be interesting to know more specifically about the kinds of data that you've collected and the kinds of trend analysis that you've done, especially if you've found it very helpful or very unhelpful.

I suspect that another key to your success is your distinction between inquiry metrics (metrics that are used to prompt questions) and control metrics (questions that are used to trigger some action other than the asking of questions).

---Michael B.

Reply to This


"The only type of metrics that are inherently evil are those that are used to measure human beings. Human beings are smart enough to figure out how to fudge the numbers and such measurements encourage unprofitable behaviors."


I'm not sure I said any metric was inherently evil. I said metrics used to evaluate (people) introduce dysfunction. That sounds a lot like your position, linda.

"Managers generally don't manage by walking around and listening, unless you are going to evaluate performance based on what you overhear, which might be the latest football scores. And if you've got 42 people geographically dispersed, that's not possible regardless."

I work at a massively geographically distributed agile team. Literally every member of the engineering staff works remotely. I agree, to "manage by walking around" on a geographically distributed team, you need different tools - like VOIP, constant IRC, core hours, e-mail, wikis, triggers on commits, etc. But you can still do (something like MBWA).

I've said several times here that informal metrics to understand can be very helpful. It sounds to me like you're protesting a little bit too much, and ignoring a great of the silicon valley success and dysfunction literature. (eg Doug Hoffman, etc.)

Reply to This

RSS

© 2010   Created by Rosie Sherry

Badges  |  Report an Issue  |  Terms of Service

Sign in to chat!