To content
Read more about how we use cookies on gu.se
# Important assumptions

- PREHAB
- Mapping
- Which methods are useful?
- Assumptions

THIS TEXT IS WRITTEN FOR MODELLING VEGETATION DISTRIBUTION. CAN IT BE APPLIALBE FOR ALL RESPONSE VARIABLES, BOTH DISTRIBUTION AND ABUNDANCE?

**The modelling methods evaluated in PREHAB (GAM, MARS, RF and MaxEnt) have three important assumptions and they must be tested:
**

**1. ****Is environment of modelled vegetation strongly disturbed naturally (e.g. storms, succession, invasive species) or by human impact?**

Models assume that vegetation is in equilibrium or at least in quasi-equilibrium with the environment, this means that environmental change is slow relative to the life span of the biota. (Guisan and Zimmermann, 2000). For example, species or communities, which are relatively persistent or react slowly to variability in environmental conditions (e.g. arctic and alpine). In contrast, invasive species are not in equilibrium with environment in the invaded range, and thus should preferably be modelled using their distribution in the native range (Guisan and Thuiller, 2005). In situations where this assumption fails, there are several possibilities:

- to make a large-scale prediction, since less detailed knowledge of the physiology and behaviour of the species involved is necessary.
- to include environmental changes as predictor variables in a model. For example, sediments within a damping area can be considered as separate type of substrate (proxy of constantly disturbed environment), which is usually one of the most important predictor variable in benthic species distribution models.
- to use dynamic simulation models (Guisan and Zimmermann, 2000), however, only few of them have yet been developed and tested (e.g. Axis-Arroyo and Mateu, 2004).

2. **Are there strong correlations among environmental variables (multi-collinearity)?**

A perfect model is when predictors explain maximum variance of species distribution and do not correlate among themselves (no multi-collinearity among predictors). Practical consequences of multi-collinearity in the model are described more detailed by Sokahl & Rolf (1995).

Multi-collinearity can be assessed using:

- correlation matrices and test,
- principal component analysis,
- variance inflation factor (VIF).

General suggestions are that predictor variables, with r above ±0.5 or ±0.7, or VIF above 5 or 10, should be excluded (Zuur et al., 2007). After removal of one, highly correlated predictor, the r’s or VIF’s should be computed and analysed again if there is no serious collinearity left among environmental variables. Standardization (subtract by mean and divided by standard deviation) of predictors may also help to reduce multi-collinearity, especially when including interactions between environmental variables.

**3. Is there spatial pattern of residuals (observed value minus predicted value by model)?**

A perfect model is when uncorrelated environmental variables explain maximum variance of species distribution and, consequently do not leave unexplained information in the residuals. Therefore, no spatial pattern of residuals should be left after model fit, and this can be inspected graphically (Dorman et al., 2007) using:

- Moran’s I plots (also termed Moran’s I correlogram),
- Geary’s C correlograms,
- semi-variograms.

Statistically significant (p< 0.05) and relatively high (more than ±0.5 or ±0.7) Moran’s I or Geary’s C indicate serious spatial patterns (autocorrelation) of residuals. In other words it means that you have not included an important environmental variable, which should have explained species distribution. The graphical inspection of spatial structure of residuals by semi-variograms may help you find this important predictor. Suppose you want to predict macroalgae distribution using depth as the only environmental variable in the model.

Consequently, you will find relatively high residuals (big differences between observed and predicted values) and their spatial pattern: model will predict macroalgae being present within the species optimal depth despite that there are observations without macroalgae within this depth zone. Why? Most probably these wrong predictions are due to other limiting factors like available substrate or wave exposure or grazers, which have not been included as predictors in the model. Once you find missing potentially important predictor and add it to the model, Moran’s I or Geary’s C should be computed and analysed again if there is no evident spatial patterns of residuals left.

If your data is based on transect sampling (samples taken successively along the transect line), you most probably will get residuals of model in neighbouring samples being more similar than those in a distance. This will violate the assumption. One way to solve it - remove samples, which are the closest to each other. On the other hand, some methods like GAM can account autocorrelation in a model’s errors extending it to Generalized additive mixed models (Zuur et al., 2009).

**Table 1. Important properties and assumptions of four selected methods:** *Generalized additive models (GAM), Multivariate adaptive regression splines (MARS), Random forest (RF) and Maximum entropy modelling (MaxEnt). *

**Back to **"Which methods?"

Author: Martynas