Cargando…

Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method

The prediction of log P values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomi...

Descripción completa

Detalles Bibliográficos
Autores principales: Donyapour, Nazanin, Dickson, Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8295205/
https://www.ncbi.nlm.nih.gov/pubmed/34181200
http://dx.doi.org/10.1007/s10822-021-00400-x
_version_ 1783725388028968960
author Donyapour, Nazanin
Dickson, Alex
author_facet Donyapour, Nazanin
Dickson, Alex
author_sort Donyapour, Nazanin
collection PubMed
description The prediction of log P values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a “master” dataset comprised of over 41, 000 unique log P values. The specific molecular targets in the SAMPL7 log P prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 log P units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end log P predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes.
format Online
Article
Text
id pubmed-8295205
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-82952052022-07-01 Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method Donyapour, Nazanin Dickson, Alex J Comput Aided Mol Des Article The prediction of log P values is one part of the statistical assessment of the modeling of proteins and ligands (SAMPL) blind challenges. Here, we use a molecular graph representation method called Geometric Scattering for Graphs (GSG) to transform atomic attributes to molecular features. The atomic attributes used here are parameters from classical molecular force fields including partial charges and Lennard-Jones interaction parameters. The molecular features from GSG are used as inputs to neural networks that are trained using a “master” dataset comprised of over 41, 000 unique log P values. The specific molecular targets in the SAMPL7 log P prediction challenge were unique in that they all contained a sulfonyl moeity. This motivated a set of ClassicalGSG submissions where predictors were trained on different subsets of the master dataset that are filtered according to chemical types and/or the presence of the sulfonyl moeity. We find that our ranked prediction obtained 5th place with an RMSE of 0.77 log P units and an MAE of 0.62, while one of our non-ranked predictions achieved first place among all submissions with an RMSE of 0.55 and an MAE of 0.44. After the conclusion of the challenge we also examined the performance of open-source force field parameters that allow for an end-to-end log P predictor model: General AMBER Force Field (GAFF), Universal Force Field (UFF), Merck Molecular Force Field 94 (MMFF94) and Ghemical. We find that ClassicalGSG models trained with atomic attributes from MMFF94 can yield more accurate predictions compared to those trained with CGenFF atomic attributes. 2021-06-28 2021-07 /pmc/articles/PMC8295205/ /pubmed/34181200 http://dx.doi.org/10.1007/s10822-021-00400-x Text en https://creativecommons.org/licenses/by/4.0/This AM is a PDF file of the manuscript accepted for publication after peer review, when applicable, but does not reflect post-acceptance improvements, or any corrections. Use of this AM is subject to the publisher’s embargo period and AM terms of use. Under no circumstances may this AM be shared or distributed under a Creative Commons or other form of open access license, nor may it be reformatted or enhanced, whether by the Author or third parties. See here for Springer Nature’s terms of use for AM versions of subscription articles: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
spellingShingle Article
Donyapour, Nazanin
Dickson, Alex
Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title_full Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title_fullStr Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title_full_unstemmed Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title_short Predicting partition coefficients for the SAMPL7 physical property challenge using the ClassicalGSG method
title_sort predicting partition coefficients for the sampl7 physical property challenge using the classicalgsg method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8295205/
https://www.ncbi.nlm.nih.gov/pubmed/34181200
http://dx.doi.org/10.1007/s10822-021-00400-x
work_keys_str_mv AT donyapournazanin predictingpartitioncoefficientsforthesampl7physicalpropertychallengeusingtheclassicalgsgmethod
AT dicksonalex predictingpartitioncoefficientsforthesampl7physicalpropertychallengeusingtheclassicalgsgmethod