Pros and cons of evaluated methods

All methods are open source programmes, where GAM, MARS and RF are available in several packages in R and MaxEnt can be freely downloaded from http://www.cs.princeton.edu/~schapire/maxent/ and used for research activities.

These methods share common properties (Table x): they do not have assumption about relationships between response and independent variables, allow categorical and continuous independent variables and their interactions, they have own model selection methods and measures of relative importance of independent variables, and can handle large datasets.

GAM, MARS, RF and MaxEnt can plot empirical relationships between response and independent variables using partial response curves (Fig. xx). These plots are important for biological interpretation and validation of a model. For example, when there are many independent variables in the model often some of them are collinear and obtained relationships between response and these variables can be in opposite to what expected and hardly biologically interpreted.

 Figure xx. Partial response curves between probability of red alga occurrence and orbital wave velocity (ORBITALBV) by four classification methods: Generalized additive models (GAM), Multivariate adaptive regression splines (MARS), Random forest (RF) and Maximum entropy modelling (MaxEnt).

On the other hand, these models differ among themselves: GAM and MARS assume Binomial or quasibinomial error distribution, and therefore overdispertion and variances of residuals must be tested, whereas RF and MaxEnt do not assume any particular error distribution.

Missing values in data treated differently among the methods: in GAM and RF can be specified how to deal with missing values, whereas missing values are not allowed in MARS and MaxEnt.

