We propose a statistical explorative method for data integration. It is developed in the context of early drug development for which it enables the detection of chemical substructures and the identification of genes that mediate their association with the bioactivity (BA). The core of the method is a sparse singular value decomposition for the identification of the gene set and a permutation-based method for the control of the false discovery rate. The method is illustrated using a real dataset, and its properties are empirically evaluated by means of a simulation study. Quantitative Structure Transcriptional Activity Relationship (QSTAR, www.qstar-consortium.org) is a new paradigm in early drug development that extends QSAR by not only considering data on the chemical structure of the compounds and on the compound-induced BA, but by simultaneously using transcriptomics data (gene expression). This approach enables, for example, the detection of chemical substructures that are associated with BA, while at the same time a gene set is correlated with both these substructures and the BA. Although causal associations cannot be formally concluded, these associations may suggest that the compounds act on the BA through a particular genomic pathway.
Principal bicorrelation analysis: Unraveling associations between three data sources. Mattiello, Federico; Thas, Olivier; Verbist, Bie. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 26 (3):534-551; 10.1080/10543406.2015.1052491 2016.