Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction

Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

The code of Needles is freely available on github (https://github.com/arnedc/Needles).

De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction. GENETICS, 203 (1):543-+; 10.1534/genetics.115.179887 MAY 2016.