Cargando…
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery
[Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical
Society
2020
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189366/ https://www.ncbi.nlm.nih.gov/pubmed/32243153 http://dx.doi.org/10.1021/acs.jcim.9b01053 |
_version_ | 1783527482262028288 |
---|---|
author | Minnich, Amanda J. McLoughlin, Kevin Tse, Margaret Deng, Jason Weber, Andrew Murad, Neha Madej, Benjamin D. Ramsundar, Bharath Rush, Tom Calad-Thomson, Stacie Brase, Jim Allen, Jonathan E. |
author_facet | Minnich, Amanda J. McLoughlin, Kevin Tse, Margaret Deng, Jason Weber, Andrew Murad, Neha Madej, Benjamin D. Ramsundar, Bharath Rush, Tom Calad-Thomson, Stacie Brase, Jim Allen, Jonathan E. |
author_sort | Minnich, Amanda J. |
collection | PubMed |
description | [Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL. |
format | Online Article Text |
id | pubmed-7189366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Chemical
Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-71893662020-04-29 AMPL: A Data-Driven Modeling Pipeline for Drug Discovery Minnich, Amanda J. McLoughlin, Kevin Tse, Margaret Deng, Jason Weber, Andrew Murad, Neha Madej, Benjamin D. Ramsundar, Bharath Rush, Tom Calad-Thomson, Stacie Brase, Jim Allen, Jonathan E. J Chem Inf Model [Image: see text] One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL. American Chemical Society 2020-04-03 2020-04-27 /pmc/articles/PMC7189366/ /pubmed/32243153 http://dx.doi.org/10.1021/acs.jcim.9b01053 Text en Copyright © 2020 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes. |
spellingShingle | Minnich, Amanda J. McLoughlin, Kevin Tse, Margaret Deng, Jason Weber, Andrew Murad, Neha Madej, Benjamin D. Ramsundar, Bharath Rush, Tom Calad-Thomson, Stacie Brase, Jim Allen, Jonathan E. AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title_full | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title_fullStr | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title_full_unstemmed | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title_short | AMPL: A Data-Driven Modeling Pipeline for Drug Discovery |
title_sort | ampl: a data-driven modeling pipeline for drug discovery |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189366/ https://www.ncbi.nlm.nih.gov/pubmed/32243153 http://dx.doi.org/10.1021/acs.jcim.9b01053 |
work_keys_str_mv | AT minnichamandaj ampladatadrivenmodelingpipelinefordrugdiscovery AT mcloughlinkevin ampladatadrivenmodelingpipelinefordrugdiscovery AT tsemargaret ampladatadrivenmodelingpipelinefordrugdiscovery AT dengjason ampladatadrivenmodelingpipelinefordrugdiscovery AT weberandrew ampladatadrivenmodelingpipelinefordrugdiscovery AT muradneha ampladatadrivenmodelingpipelinefordrugdiscovery AT madejbenjamind ampladatadrivenmodelingpipelinefordrugdiscovery AT ramsundarbharath ampladatadrivenmodelingpipelinefordrugdiscovery AT rushtom ampladatadrivenmodelingpipelinefordrugdiscovery AT caladthomsonstacie ampladatadrivenmodelingpipelinefordrugdiscovery AT brasejim ampladatadrivenmodelingpipelinefordrugdiscovery AT allenjonathane ampladatadrivenmodelingpipelinefordrugdiscovery |