Cargando…

Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides

BACKGROUND: Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to...

Descripción completa

Detalles Bibliográficos
Autores principales: Martín-Navarro, Antonio, Gaudioso-Simón, Andrés, Álvarez-Jarreta, Jorge, Montoya, Julio, Mayordomo, Elvira, Ruiz-Pesini, Eduardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5341421/
https://www.ncbi.nlm.nih.gov/pubmed/28270093
http://dx.doi.org/10.1186/s12859-017-1562-7
_version_ 1782512986701692928
author Martín-Navarro, Antonio
Gaudioso-Simón, Andrés
Álvarez-Jarreta, Jorge
Montoya, Julio
Mayordomo, Elvira
Ruiz-Pesini, Eduardo
author_facet Martín-Navarro, Antonio
Gaudioso-Simón, Andrés
Álvarez-Jarreta, Jorge
Montoya, Julio
Mayordomo, Elvira
Ruiz-Pesini, Eduardo
author_sort Martín-Navarro, Antonio
collection PubMed
description BACKGROUND: Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. RESULTS: We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. CONCLUSIONS: Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved performance. Mitoclass.1 accuracy could be improved in the future when more mtDNA missense substitutions will be available for updating the attributes and retraining the model. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1562-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5341421
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53414212017-03-10 Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides Martín-Navarro, Antonio Gaudioso-Simón, Andrés Álvarez-Jarreta, Jorge Montoya, Julio Mayordomo, Elvira Ruiz-Pesini, Eduardo BMC Bioinformatics Research Article BACKGROUND: Several methods have been developed to predict the pathogenicity of missense mutations but none has been specifically designed for classification of variants in mtDNA-encoded polypeptides. Moreover, there is not available curated dataset of neutral and damaging mtDNA missense variants to test the accuracy of predictors. Because mtDNA sequencing of patients suffering mitochondrial diseases is revealing many missense mutations, it is needed to prioritize candidate substitutions for further confirmation. Predictors can be useful as screening tools but their performance must be improved. RESULTS: We have developed a SVM classifier (Mitoclass.1) specific for mtDNA missense variants. Training and validation of the model was executed with 2,835 mtDNA damaging and neutral amino acid substitutions, previously curated by a set of rigorous pathogenicity criteria with high specificity. Each instance is described by a set of three attributes based on evolutionary conservation in Eukaryota of wildtype and mutant amino acids as well as coevolution and a novel evolutionary analysis of specific substitutions belonging to the same domain of mitochondrial polypeptides. Our classifier has performed better than other web-available tested predictors. We checked performance of three broadly used predictors with the total mutations of our curated dataset. PolyPhen-2 showed the best results for a screening proposal with a good sensitivity. Nevertheless, the number of false positive predictions was too high. Our method has an improved sensitivity and better specificity in relation to PolyPhen-2. We also publish predictions for the complete set of 24,201 possible missense variants in the 13 human mtDNA-encoded polypeptides. CONCLUSIONS: Mitoclass.1 allows a better selection of candidate damaging missense variants from mtDNA. A careful search of discriminatory attributes and a training step based on a curated dataset of amino acid substitutions belonging exclusively to human mtDNA genes allows an improved performance. Mitoclass.1 accuracy could be improved in the future when more mtDNA missense substitutions will be available for updating the attributes and retraining the model. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1562-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-07 /pmc/articles/PMC5341421/ /pubmed/28270093 http://dx.doi.org/10.1186/s12859-017-1562-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Martín-Navarro, Antonio
Gaudioso-Simón, Andrés
Álvarez-Jarreta, Jorge
Montoya, Julio
Mayordomo, Elvira
Ruiz-Pesini, Eduardo
Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title_full Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title_fullStr Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title_full_unstemmed Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title_short Machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial DNA-encoded polypeptides
title_sort machine learning classifier for identification of damaging missense mutations exclusive to human mitochondrial dna-encoded polypeptides
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5341421/
https://www.ncbi.nlm.nih.gov/pubmed/28270093
http://dx.doi.org/10.1186/s12859-017-1562-7
work_keys_str_mv AT martinnavarroantonio machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides
AT gaudiososimonandres machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides
AT alvarezjarretajorge machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides
AT montoyajulio machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides
AT mayordomoelvira machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides
AT ruizpesinieduardo machinelearningclassifierforidentificationofdamagingmissensemutationsexclusivetohumanmitochondrialdnaencodedpolypeptides