text mining

Cell line name recognition in support of the identification of synthetic lethality in cancer from text

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature.


EVEX is a text mining resource built on top of PubMed abstracts and PubMed Central full texts. It contains over 40 million bio-molecular events among more than 76 million automatically extracted gene/protein name mentions. The text mining data further has been enriched with gene normalization results, allowing straightforward integration with external resources. Further, gene families from Ensembl and HomoloGene provide homology-based event generalizations.

Subscribe to RSS - text mining