Scrum metrics: when to use?
Metrics are numbers, figures; they cannot represent reality….but they could be triggers that help in taking decisions.
Metrics, in software development, are usually requested by managers that want to know about the teams’ performance.
Agile teams usually know very well about their performance and the quality they are putting in. They also know quite well what’s the magic formula, that is different for every team, to balance effectively perfomance and quality.
One day it happens, though, that pressure starts to grow.
The ProductOwner asks for some more functionalities, usually because of some changes about new ‘absurd’ deadlines. Obviously, the team tries to find different solutions or compromises engaging both the ProductOwner and the management people, but business is business they say.
The team struggles in balancing the requested increased level of performance, with the high quality standards agile prescribes, whatever XP, Scrum, Kanban or any other approach you are using. The healthy practices like pair programming, test driven development, refactoring, code inspection are slowly but progressively put apart.
This situation is disruptive for quality as well as for the team.
But, what I’ve noticed is that providing the right metrics to the management, could be the key factor that helps teams to face the situations like the one just mentioned.
Let’s see how.
First: your metrics obviously will report performance data like the velocity (user story points finished within each iteration).
Then, remember to associate to the velocity, also the metric ‘remaining story points’ (story points related to stories not finished within the sprint); that’s a metric that can mean at least two major things:
- The team wronlgy selected too much stories (e.g. due to estimation errors)
- The team was forced to select (consciously or unconsciously) too much stories (e.g. the CEO decide a new deadline and the PO pushes for more stories)
Usually, if the problem is how the team makes the estimations, this number will decrease progressively the next iterations, until it will reach zero.
When, on the other side, the PO and/or the managers push for releasing more stories, that number won’t decrease. This situation is also witnessed by other qualitative indicators.
Gather other qualitative information like the coverage of the unit tests, the # of test successfull/failed, # of issues closed vs new ones, # of broken builds, can help to study the qualitative trend of the product.
This data should be automatically collected by continuous building and integration systems and, again, automatically aggregated in database, in order to let the team remain completely focused on the development.
Once I have that data, I usually create two different graphs: the radar and the line graph.
The radar graph represents the results of each sprint, joining together different indexes. This kind of graph, actually, gives also a visual representation of the covered area: the larger, the better. The line graph helps to see what’s the underway trend of quality, verifying for example if some corrective actions done in the past, have brought any results.
Hence, let’s first concentrate on the radar graph. What I make is to transform the data just collected (see above), in qualitative indexes (e.g. from 1 to 5: 1 poor…5 excellent). Then, I calculate a Quality Index (QI), which is simply the sum of each index.
I’m not yet satisfied.
I do not want to assign the same weights to all indexes. Indexes like the # of tests sucessfull and the coverage of the unit test or, furthermore, the # of broken build, must have different weights.
The successfulness of tests are something that I assume is somehow taken for granted. When a developer writes a unit test, I assume that she writes the code correctly.
Talking, instead, of indexes such as the broken builds or the coverage, I consider them much more important, because break a build or have a low unit test coverage, is something really bad.
This is the reason why I assign weights to the metrics I’m studying, like the ones below reported.
Now it’s time to recalculate the indexes according to the weights just assigned to each.
And finally, my graphs magically appear!
Here it is the radar graph where every axes represent an index.
The line trend graph is displying the QI, that is the sum of all the weighted indexes.
Remember, metrics are numbers.
Do not rely only on them when it’s time to take decisions: use this data to prove and certificate your feelings, impressions.
Often your gut feeling is better than any metric!
Have fun
















http://www.inspearit.it/it/