Till startsida
To content Read more about how we use cookies on gu.se

Measures of performance

In PREHAB, the performance of models were measured using external validation, i.e. with a test data set, and with the statistics described below.

To measure the performance of different modelling techniques for differnt types of biological response variables we used three conventional indices: the area under the receiver-operating curves (AUC) for classification models; and for regression models the coefficient of determination (R2) and the normalised root mean squared error (NRMSE).

Area under reciever-operating curves (AUC)
The AUC is a commonly used measure of the performance of classification models. It is based on the ability of models to find ”true positives” (sensitivity) and ”true negatives” (speficity). In terms of modelling occurrence of vegetation, high sensitivity means correctly identifying plots with vegetation and high specificity means corectly identifying plots without vegetation. The AUC is estimated as the area under a plot of sensitivity and 1-specificity (Fig. 1).


Figure 1. Explanation of concepts of ROC and AUC using a real-life example from PREHAB (Halidrys siliquosa at Vinga).

The meaning of the AUC is slightly elusive but it may be interpreted as an estimate of the probability that a model will rank a randomly chosen “true positive” higher than a randomly chosen “true negative”. As rules of thumb, AUC>0.8 is considered “good” and AUC>0.9 is considered “excellent”. For various technical reasons, the use of AUC as the sole measure of model performance has been critisised. Nevertheless, it is commonly used and for the purposes of PREHAB, comparisons among techniques and variables, it was deemed appropriate.

  • Read more on AUC here!


Coefficient of determination (R2) & Normailised root mean square error (NRMSE)
Two different aspects of performance were evaluated for quantitative regression models within PREHAB: (1) the amount of variability explained by models and (2) the deviations of indvidual observations from what was predicted by the model.

The first of these two was measured using the coefficient of determination (R2) between observed data (test data) and the model predictions based on fitted models using GAM, Random forest or MARS. The R2 is a standard statistic for estimation and statistical testing of the power of predictive relationships. Testing the statistical significance of R2 gives an indication of whether the model performs better than a random guess, but its interpretation as the proportion of the total variability explained by the model is a useful and fairly intuitive measure.

The root mean square error (rmse) is a measure of how much individual observations on average deviate from the value predicted by the model. Because the modelling within PREHAB was done on a very diverse set of variables, i.e. number of individuals per square metre for invertebrates (ranging from 0-1000’s) and per cent cover of vegetation (ranging from 0-100), we chose to standardise all estimates of rmse. More specifically we used the normalised rmse (nrmse) which relates the rmse to the observed range of the variable. Thus, the nrmse can be interpreted as a fraction of the overall range that is typically resolved by the model (Fig. 2).

Figure 2. The predictive power for a RandomForest model of total algal cover in Vinga, Kattegat. The model explains 68% of the variability in the test data and the average deviation ±17% of the range.


© University of Gothenburg, Sweden Box 100, S-405 30 Gothenburg
Phone +46 31-786 0000, About the website

| Map

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?