Cargando…

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

[Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales: Minnich, Amanda J., McLoughlin, Kevin, Tse, Margaret, Deng, Jason, Weber, Andrew, Murad, Neha, Madej, Benjamin D., Ramsundar, Bharath, Rush, Tom, Calad-Thomson, Stacie, Brase, Jim, Allen, Jonathan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189366/
https://www.ncbi.nlm.nih.gov/pubmed/32243153
http://dx.doi.org/10.1021/acs.jcim.9b01053
_version_ 1783527482262028288
author Minnich, Amanda J.
McLoughlin, Kevin
Tse, Margaret
Deng, Jason
Weber, Andrew
Murad, Neha
Madej, Benjamin D.
Ramsundar, Bharath
Rush, Tom
Calad-Thomson, Stacie
Brase, Jim
Allen, Jonathan E.
author_facet Minnich, Amanda J.
McLoughlin, Kevin
Tse, Margaret
Deng, Jason
Weber, Andrew
Murad, Neha
Madej, Benjamin D.
Ramsundar, Bharath
Rush, Tom
Calad-Thomson, Stacie
Brase, Jim
Allen, Jonathan E.
author_sort Minnich, Amanda J.
collection PubMed
description [Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL.
format Online
Article
Text
id pubmed-7189366
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-71893662020-04-29 AMPL: A Data-Driven Modeling Pipeline for Drug Discovery Minnich, Amanda J. McLoughlin, Kevin Tse, Margaret Deng, Jason Weber, Andrew Murad, Neha Madej, Benjamin D. Ramsundar, Bharath Rush, Tom Calad-Thomson, Stacie Brase, Jim Allen, Jonathan E. J Chem Inf Model [Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL. American Chemical Society 2020-04-03 2020-04-27 /pmc/articles/PMC7189366/ /pubmed/32243153 http://dx.doi.org/10.1021/acs.jcim.9b01053 Text en Copyright © 2020 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Minnich, Amanda J.
McLoughlin, Kevin
Tse, Margaret
Deng, Jason
Weber, Andrew
Murad, Neha
Madej, Benjamin D.
Ramsundar, Bharath
Rush, Tom
Calad-Thomson, Stacie
Brase, Jim
Allen, Jonathan E.
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title_full AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title_fullStr AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title_full_unstemmed AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title_short AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
title_sort ampl: a data-driven modeling pipeline for drug discovery
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189366/
https://www.ncbi.nlm.nih.gov/pubmed/32243153
http://dx.doi.org/10.1021/acs.jcim.9b01053
work_keys_str_mv AT minnichamandaj ampladatadrivenmodelingpipelinefordrugdiscovery
AT mcloughlinkevin ampladatadrivenmodelingpipelinefordrugdiscovery
AT tsemargaret ampladatadrivenmodelingpipelinefordrugdiscovery
AT dengjason ampladatadrivenmodelingpipelinefordrugdiscovery
AT weberandrew ampladatadrivenmodelingpipelinefordrugdiscovery
AT muradneha ampladatadrivenmodelingpipelinefordrugdiscovery
AT madejbenjamind ampladatadrivenmodelingpipelinefordrugdiscovery
AT ramsundarbharath ampladatadrivenmodelingpipelinefordrugdiscovery
AT rushtom ampladatadrivenmodelingpipelinefordrugdiscovery
AT caladthomsonstacie ampladatadrivenmodelingpipelinefordrugdiscovery
AT brasejim ampladatadrivenmodelingpipelinefordrugdiscovery
AT allenjonathane ampladatadrivenmodelingpipelinefordrugdiscovery