Cargando…

Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues

Non-synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving r...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Ruiyang, Cao, Baixin, Peng, Zhenling, Oldfield, Christopher J., Kurgan, Lukasz, Wong, Ka-Chun, Yang, Jianyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469993/
https://www.ncbi.nlm.nih.gov/pubmed/34572550
http://dx.doi.org/10.3390/biom11091337
_version_ 1784574084034592768
author Song, Ruiyang
Cao, Baixin
Peng, Zhenling
Oldfield, Christopher J.
Kurgan, Lukasz
Wong, Ka-Chun
Yang, Jianyi
author_facet Song, Ruiyang
Cao, Baixin
Peng, Zhenling
Oldfield, Christopher J.
Kurgan, Lukasz
Wong, Ka-Chun
Yang, Jianyi
author_sort Song, Ruiyang
collection PubMed
description Non-synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving room for improvements. We propose a new sequence-based predictor, DMBS, which addresses the need to improve the predictive quality. The design of DMBS relies on the observation that the deleterious mutations are likely to occur at the highly conserved and functionally important positions in the protein sequence. Correspondingly, we introduce two innovative components. First, we improve the estimates of the conservation computed from the multiple sequence profiles based on two complementary databases and two complementary alignment algorithms. Second, we utilize putative annotations of functional/binding residues produced by two state-of-the-art sequence-based methods. These inputs are processed by a random forests model that provides favorable predictive performance when empirically compared against five other machine-learning algorithms. Empirical results on four benchmark datasets reveal that DMBS achieves AUC > 0.94, outperforming current methods, including protein structure-based approaches. In particular, DMBS secures AUC = 0.97 for the SNPdbe and ExoVar datasets, compared to AUC = 0.70 and 0.88, respectively, that were obtained by the best available methods. Further tests on the independent HumVar dataset shows that our method significantly outperforms the state-of-the-art method SNPdryad. We conclude that DMBS provides accurate predictions that can effectively guide wet-lab experiments in a high-throughput manner.
format Online
Article
Text
id pubmed-8469993
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84699932021-09-27 Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues Song, Ruiyang Cao, Baixin Peng, Zhenling Oldfield, Christopher J. Kurgan, Lukasz Wong, Ka-Chun Yang, Jianyi Biomolecules Article Non-synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving room for improvements. We propose a new sequence-based predictor, DMBS, which addresses the need to improve the predictive quality. The design of DMBS relies on the observation that the deleterious mutations are likely to occur at the highly conserved and functionally important positions in the protein sequence. Correspondingly, we introduce two innovative components. First, we improve the estimates of the conservation computed from the multiple sequence profiles based on two complementary databases and two complementary alignment algorithms. Second, we utilize putative annotations of functional/binding residues produced by two state-of-the-art sequence-based methods. These inputs are processed by a random forests model that provides favorable predictive performance when empirically compared against five other machine-learning algorithms. Empirical results on four benchmark datasets reveal that DMBS achieves AUC > 0.94, outperforming current methods, including protein structure-based approaches. In particular, DMBS secures AUC = 0.97 for the SNPdbe and ExoVar datasets, compared to AUC = 0.70 and 0.88, respectively, that were obtained by the best available methods. Further tests on the independent HumVar dataset shows that our method significantly outperforms the state-of-the-art method SNPdryad. We conclude that DMBS provides accurate predictions that can effectively guide wet-lab experiments in a high-throughput manner. MDPI 2021-09-09 /pmc/articles/PMC8469993/ /pubmed/34572550 http://dx.doi.org/10.3390/biom11091337 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Ruiyang
Cao, Baixin
Peng, Zhenling
Oldfield, Christopher J.
Kurgan, Lukasz
Wong, Ka-Chun
Yang, Jianyi
Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title_full Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title_fullStr Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title_full_unstemmed Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title_short Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
title_sort accurate sequence-based prediction of deleterious nssnps with multiple sequence profiles and putative binding residues
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469993/
https://www.ncbi.nlm.nih.gov/pubmed/34572550
http://dx.doi.org/10.3390/biom11091337
work_keys_str_mv AT songruiyang accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT caobaixin accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT pengzhenling accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT oldfieldchristopherj accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT kurganlukasz accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT wongkachun accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues
AT yangjianyi accuratesequencebasedpredictionofdeleteriousnssnpswithmultiplesequenceprofilesandputativebindingresidues