Cargando…

Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors

MOTIVATION: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite conside...

Descripción completa

Detalles Bibliográficos
Autores principales: Zadorozhny, Anton, Smirnov, Anton, Filimonov, Dmitry, Lagunin, Alexey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435372/
https://www.ncbi.nlm.nih.gov/pubmed/37535750
http://dx.doi.org/10.1093/bioinformatics/btad484
_version_ 1785092080711761920
author Zadorozhny, Anton
Smirnov, Anton
Filimonov, Dmitry
Lagunin, Alexey
author_facet Zadorozhny, Anton
Smirnov, Anton
Filimonov, Dmitry
Lagunin, Alexey
author_sort Zadorozhny, Anton
collection PubMed
description MOTIVATION: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. RESULTS: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631–0.993; MCC: 0.191–0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. AVAILABILITY AND IMPLEMENTATION: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request.
format Online
Article
Text
id pubmed-10435372
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104353722023-08-19 Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors Zadorozhny, Anton Smirnov, Anton Filimonov, Dmitry Lagunin, Alexey Bioinformatics Original Paper MOTIVATION: Next Generation Sequencing technologies make it possible to detect rare genetic variants in individual patients. Currently, more than a dozen software and web services have been created to predict the pathogenicity of variants related with changing of amino acid residues. Despite considerable efforts in this area, at the moment there is no ideal method to classify pathogenic and harmless variants, and the assessment of the pathogenicity is often contradictory. In this article, we propose to use peptides structural formulas of proteins as an amino acid residues substitutions description, rather than a single-letter code. This allowed us to investigate the effectiveness of chemoinformatics approach to assess the pathogenicity of variants associated with amino acid substitutions. RESULTS: The structure-activity relationships analysis relying on protein-specific data and atom centric substructural multilevel neighborhoods of atoms (MNA) descriptors of molecular fragments appeared to be suitable for predicting the pathogenic effect of single amino acid variants. MNA-based Naïve Bayes classifier algorithm, ClinVar and humsavar data were used for the creation of structure-activity relationships models for 10 proteins. The performance of the models was compared with 11 different predicting tools: 8 individual (SIFT 4G, Polyphen2 HDIV, MutationAssessor, PROVEAN, FATHMM, MVP, LIST-S2, MutPred) and 3 consensus (M-CAP, MetaSVM, MetaLR). The accuracy of MNA-based method varies for the proteins (AUC: 0.631–0.993; MCC: 0.191–0.891). It was similar for both the results of comparisons with the other individual predictors and third-party protein-specific predictors. For several proteins (BRCA1, BRCA2, COL1A2, and RYR1), the performance of the MNA-based method was outstanding, capable of capturing the pathogenic effect of structural changes in amino acid substitutions. AVAILABILITY AND IMPLEMENTATION: The datasets are available as supplemental data at Bioinformatics online. A python script to convert amino acid and nucleotide sequences from single-letter codes to SD files is available at https://github.com/SmirnygaTotoshka/SequenceToSDF. The authors provide trial licenses for MultiPASS software to interested readers upon request. Oxford University Press 2023-08-03 /pmc/articles/PMC10435372/ /pubmed/37535750 http://dx.doi.org/10.1093/bioinformatics/btad484 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Zadorozhny, Anton
Smirnov, Anton
Filimonov, Dmitry
Lagunin, Alexey
Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title_full Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title_fullStr Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title_full_unstemmed Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title_short Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
title_sort prediction of pathogenic single amino acid substitutions using molecular fragment descriptors
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10435372/
https://www.ncbi.nlm.nih.gov/pubmed/37535750
http://dx.doi.org/10.1093/bioinformatics/btad484
work_keys_str_mv AT zadorozhnyanton predictionofpathogenicsingleaminoacidsubstitutionsusingmolecularfragmentdescriptors
AT smirnovanton predictionofpathogenicsingleaminoacidsubstitutionsusingmolecularfragmentdescriptors
AT filimonovdmitry predictionofpathogenicsingleaminoacidsubstitutionsusingmolecularfragmentdescriptors
AT laguninalexey predictionofpathogenicsingleaminoacidsubstitutionsusingmolecularfragmentdescriptors