Meaningfulness in Measurement

Scale types

Understanding scale types help us to decide when statements about measurement make sense. For example, computing ratios is meaningless if the scale is nominal, ordinal or interval.

Meaningfulness of statements regarding measurement

Consider the following statements.
Meaningful

  • The number of errors discovered during testing was 100.
  • A semantic error takes twice as long to fix as a syntactic error.

Not meaningful

  • The cost of fixing an error in program X is at least 100.
  • A semantic error is twice as complex as a syntactic error.

Definition of Meaningful

"A statement involving measurement is meaningful if its truth value is invariant of the transformations of allowable scales".
Examples:

  • "Our prime minister is 150 years old" .
    • Meaningful but…
    • Not true.
  • "Fred is twice as tall as Jane".
    • Meaningful; the statement implies that the measures are on the ratio scale (since it uses scalar multiplication)
    • The truth value of the statement holds irrespective of the unit of measurement (cm, inch, feet, etc)
  • "The temperature in Tokyo today is twice that in London".
    • The statement implies a ratio scale.
    • If Tokyo's temperature is 40oC and London's 20oC then (on the Celcius scale) the temperature is true. However, on the Fahrenheit scale the statement is not true.
    • Hence the statement is not meaningful.

Statistical operations on measures

The scale type of a measure affects the types of operations and statistical analyses that can be sensibly applied to the data.
A measure of central tendency tells us something about where the middle of the set is likely to be.

  • Mean: average value of the data set
  • Median: value of the middle-ranked item
  • Mode: the value of the most commonly occurring item

Example

Suppose we have a data set {x1,…,xn} where each data point xi is a measurement of understandability for module i in a system X.
We also have {y1,…ym} representing the understandability of modules in a system Y

  • Which of the two systems has the higher average understandability?
    • We can find it by taking averages if average is meaningful for the measure.
  • Suppose we assess a module's understandability according to the classification trivial, simple, moderate, complex and incomprehensible.
Trivial Simple Moderate Complex Incomprehensible
M 1 2 3 4 5
  • This is an ordinal scale.
  • Suppose X has five modules with understandability rated as:
    • U(x1) = 1, U(x2) = 2, U(x3) = 2, U(x4) = 3, U(x5) = 5
    • Mean(X) = 2.6
  • Suppose Y has seven modules with understandability:
    • U(y1) = 1, U(y2) = 3, U(y3) = 3, U(y4) = 3, U(y5) = 4, U(y6) = 4, U(y7) = 4
    • Mean(Y) = 3.1
  • Is it meaningful to say that on average the understandability of system X is better than of system Y???
  • No. If we define another valid understandability metric M' with different weightings for the complexity classes we can reach the opposite conclusion.
    • ie: U(trivial) = 0.5, U(simple) = 0.7, U(moderate) = 0.9, U(complex) = 4, U(understandability) = 10
    • Now U(X) = 2.56, U(Y) = 2.2

Mean is not a meaningful measure of central tendency for ordinal scale of data

Median is a valid measure of central tendencies on an ordinal scale

Example (2)

We have X = {x1,…,xn} and Y = {y1,…,ym} be two sets of entities for which some attributes can be measured on a ratio scale; let M and M' be two measures for the attribute in question.
Is the following statement meaningful?

  • The mean of the measure of X set is greater than the mean of the measure of Y set.

In mathematical terms, the following predicate must always be true:

(1)
\begin{align} \frac{1}{n} \sum_{i=1}^{n} M( x_{i} ) > \frac{1}{m} \sum_{j=1}^{m} M( y_{j} ) \Longleftrightarrow \frac{1}{n} \sum_{i=1}^{n} M'( x_{i} ) > \frac{1}{m} \sum_{j=1}^{m} M'( y_{j} ) \end{align}
  • We know that M = aM' (a>0). When M is substituted by aM' in the above predicate we assert that the predicate is always true.

Mean is meaningful for ratio and interval scales

The same investigation (as above) can be done for any statistical technique by using scale and transformation properties to verify that a certain analysis is valid for a given scale type.


Objective and Subjective measures

When performing some statistical analysis we strive to keep measurements objective. This is because subjective measures are likely to vary depending on the person(s) measuring.
Objective measures are primarily used for analysis, however subjective measures are also useful.

  • In many situations we cannot measure an attribute directly.
    • Instead we measure sub-attributes that are component attributes of which the original is composed (for example, measuring the quality of a compiler is broken down into speed and resource sub-attributes).

Example

We wish to assess the quality of the different types of transport available for travelling from our home to another city.

  • Two significant sub-attributes, journey time (JT) and cost per mile (CPM), are selected
  • For any two transport types, A and B, A is of higher quality than B if JT(A) < JT(B) AND CPM(A) < CPM(B).
Transport JT(hours) CPM(hours)
Car 3 1.5
Train 5 2.0
Plane 3.5 3.5
Coach 7 4.0

We define a measure M that takes a transport type into a pair of elements…

  • M(transport type) = (JT,CPM)
  • Then we have M(car) = (3, 1.5) , M(train) = (5, 2) …

The numerical relation that corresponds to the empirical relation can be defined as:

  • (x, y) is superior to (x', y') if x < x' and y < y'

Multi-criteria Decision

When taking multiple attributes into consideration when making a decision we need to assign weights for the sub-attributes. For example, we may assign a heavy weight for cost when purchasing a word processor for home use. However we may assign a heavier weight to reliability and usability while purchasing a database package for a critical air traffic control system.

Indirect measurement and meaningfulness

Density is an indirect measure of mass m and volume v ($d = \frac{m}{v}$). m and v are both ratio scale measures.

  • We can show that the indirect measure d is also a ratio scale measure.

An indirect measure of testing efficiency is attained via D (number of defects, absolute scale) and E (effort in person months, ratio scale); $T = \frac{D}{E}$

  • Since absolute is stronger than ratio scale, T is a ratio scale

Let M be an indirect measure involving Ci (i = 1, 2, …).
M will not be stronger than the weakest of the scale types of the Ci,.


A goal-based framework for Software Measurement

The first obligation of any software measurement activity is identifying the entities and attributes we wish to measure…

  • Processes are collections of software related activites
  • Products are any artefacts, deliverables or documents that result from a process activity
  • Resources are entities required by a process activity

There are two kinds of attributes, internal attributes (those measurable by examining the process, product or resource) and external attributes (those only measurable with respect to how the process, product or resources relate to its environment(s)).

Goal-Question-Metric (GQM) framework

An effective approach for selecting and implementing metrics (proposed by Victor Basili). This avoids the common fallacy of do the measurement and then decide what to do with the values.
The GQM framework

  1. Classify the entities to be examined
  2. Express the overall goals of your organisation
  3. Generate the questions whose answers you must know to determine if your goals are met
  4. Analyse each question to determine what measurements you need to do in order to answer it
  5. Check whether it is possible to do those measurements

Example

gqm_01.png

Templates for goal definition

  • Purpose: to {characterise | evaluate | predict | motivate} the {process | product | model | metric} in order to {understand | assess | manage | engineer | learn | improve} it.
  • Perspective: examine the {cost | effectiveness | correctness | defects | changes | product measures} from the viewpoint of the {developer | manager | customer}.
  • Environment: consists of the following: process factors, people factors, problem factors, methods, tools, constraints
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License