Predictive machine learning methods for species ecology

The study of species' response is a key to understand the ecology of a species (e.g. critical habitat requirement and biological invasion processes) and design better conservation and management plans (e.g. problem identification, priority assessment and risk analysis). Predictive machine learning methods can be used as a tool for modeling species distributions as well as for describing important variables and specific habitat conditions required for a target species. This study aims (1) to demonstrate how habitat information such as species response curves can be retrieved from a species distribution model (SDM), (2) to assess the effects of data prevalence on model accuracy and habitat information retrieved from SDMs, and (3) to illustrate the differences between three data-driven methods, namely a fuzzy habitat suitability model (FHSM), random forests (RF) and support vector machines (SVMs). Nineteen sets of virtual species data with different data prevalences were generated using field observed habitat conditions and hypothetical habitat suitability curves under four interaction scenarios governing the species-environment relationship for a virtual species. The effects of data prevalence on species distribution modeling were evaluated based on model accuracy and habitat information such as species response curves. Data prevalence affected both model accuracy and the assessment of species' response, with a stronger influence on the latter. The effects of data prevalence on model accuracy were less pronounced in the case of RF and SVMs which showed a higher performance. While the response curves were similar among the three models, data prevalence markedly affected the shapes of the response curves. Specifically, response curves obtained from a data set with higher prevalence showed higher tolerance to unsuitable habitat conditions, emphasizing the importance of accounting for data prevalence in the assessment of species-environment relationships. In a practical implementation of an SDM, data prevalence should be taken into account when interpreting the model results. (C) 2016 Elsevier B.V. All rights reserved.

Fukuda, Shinji; De Baets, Bernard. Data prevalence matters when assessing species' responses using data-driven species distribution models. ECOLOGICAL INFORMATICS, 32 69-78; 10.1016/j.ecoinf.2016.01.005 MAR 2016