Cargando…

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the co...

Descripción completa

Detalles Bibliográficos
Autores principales: Cang, Zixuan, Mu, Lin, Wei, Guo-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774846/
https://www.ncbi.nlm.nih.gov/pubmed/29309403
http://dx.doi.org/10.1371/journal.pcbi.1005929
_version_ 1783293825155858432
author Cang, Zixuan
Mu, Lin
Wei, Guo-Wei
author_facet Cang, Zixuan
Mu, Lin
Wei, Guo-Wei
author_sort Cang, Zixuan
collection PubMed
description This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
format Online
Article
Text
id pubmed-5774846
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57748462018-02-05 Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening Cang, Zixuan Mu, Lin Wei, Guo-Wei PLoS Comput Biol Research Article This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination. Public Library of Science 2018-01-08 /pmc/articles/PMC5774846/ /pubmed/29309403 http://dx.doi.org/10.1371/journal.pcbi.1005929 Text en © 2018 Cang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cang, Zixuan
Mu, Lin
Wei, Guo-Wei
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title_full Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title_fullStr Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title_full_unstemmed Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title_short Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
title_sort representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774846/
https://www.ncbi.nlm.nih.gov/pubmed/29309403
http://dx.doi.org/10.1371/journal.pcbi.1005929
work_keys_str_mv AT cangzixuan representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening
AT mulin representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening
AT weiguowei representabilityofalgebraictopologyforbiomoleculesinmachinelearningbasedscoringandvirtualscreening