Cargando…

Classification and analysis of a large collection of in vivo bioassay descriptions

Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced b...

Descripción completa

Detalles Bibliográficos
Autores principales: Zwierzyna, Magdalena, Overington, John P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5517062/
https://www.ncbi.nlm.nih.gov/pubmed/28678787
http://dx.doi.org/10.1371/journal.pcbi.1005641
_version_ 1783251259080310784
author Zwierzyna, Magdalena
Overington, John P.
author_facet Zwierzyna, Magdalena
Overington, John P.
author_sort Zwierzyna, Magdalena
collection PubMed
description Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced by in vivo bioassays. This is partly due to their complexity and lack of accepted reporting standards—publicly available animal screening data are only accessible in unstructured free-text format, which hinders computational analysis. In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening-related assays in rats and mice. We retrieve our dataset from ChEMBL—an open-source literature-based database focused on preclinical drug discovery. We show that in vivo assay descriptions can be effectively mined for relevant information, including experimental factors that might influence the outcome and reproducibility of animal research: genetic strains, experimental treatments, and phenotypic readouts used in the experiments. We further systematize extracted information using unsupervised language model (Word2Vec), which learns semantic similarities between terms and phrases, allowing identification of related animal models and classification of entire assay descriptions. In addition, we show that random forest models trained on features generated by Word2Vec can predict the class of drugs tested in different in vivo assays with high accuracy. Finally, we combine information mined from text with curated annotations stored in ChEMBL to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas.
format Online
Article
Text
id pubmed-5517062
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55170622017-08-07 Classification and analysis of a large collection of in vivo bioassay descriptions Zwierzyna, Magdalena Overington, John P. PLoS Comput Biol Research Article Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced by in vivo bioassays. This is partly due to their complexity and lack of accepted reporting standards—publicly available animal screening data are only accessible in unstructured free-text format, which hinders computational analysis. In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening-related assays in rats and mice. We retrieve our dataset from ChEMBL—an open-source literature-based database focused on preclinical drug discovery. We show that in vivo assay descriptions can be effectively mined for relevant information, including experimental factors that might influence the outcome and reproducibility of animal research: genetic strains, experimental treatments, and phenotypic readouts used in the experiments. We further systematize extracted information using unsupervised language model (Word2Vec), which learns semantic similarities between terms and phrases, allowing identification of related animal models and classification of entire assay descriptions. In addition, we show that random forest models trained on features generated by Word2Vec can predict the class of drugs tested in different in vivo assays with high accuracy. Finally, we combine information mined from text with curated annotations stored in ChEMBL to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas. Public Library of Science 2017-07-05 /pmc/articles/PMC5517062/ /pubmed/28678787 http://dx.doi.org/10.1371/journal.pcbi.1005641 Text en © 2017 Zwierzyna, Overington http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zwierzyna, Magdalena
Overington, John P.
Classification and analysis of a large collection of in vivo bioassay descriptions
title Classification and analysis of a large collection of in vivo bioassay descriptions
title_full Classification and analysis of a large collection of in vivo bioassay descriptions
title_fullStr Classification and analysis of a large collection of in vivo bioassay descriptions
title_full_unstemmed Classification and analysis of a large collection of in vivo bioassay descriptions
title_short Classification and analysis of a large collection of in vivo bioassay descriptions
title_sort classification and analysis of a large collection of in vivo bioassay descriptions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5517062/
https://www.ncbi.nlm.nih.gov/pubmed/28678787
http://dx.doi.org/10.1371/journal.pcbi.1005641
work_keys_str_mv AT zwierzynamagdalena classificationandanalysisofalargecollectionofinvivobioassaydescriptions
AT overingtonjohnp classificationandanalysisofalargecollectionofinvivobioassaydescriptions