Cargando…
A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants
The impact of deleterious variation on both plant fitness and crop productivity is not completely understood and is a hot topic of debates. The deleterious mutations in plants have been solely predicted using sequence conservation methods rather than function-based classifiers due to lack of well-an...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6279870/ https://www.ncbi.nlm.nih.gov/pubmed/30546376 http://dx.doi.org/10.3389/fpls.2018.01734 |
_version_ | 1783378556933373952 |
---|---|
author | Kovalev, Maxim S. Igolkina, Anna A. Samsonova, Maria G. Nuzhdin, Sergey V. |
author_facet | Kovalev, Maxim S. Igolkina, Anna A. Samsonova, Maria G. Nuzhdin, Sergey V. |
author_sort | Kovalev, Maxim S. |
collection | PubMed |
description | The impact of deleterious variation on both plant fitness and crop productivity is not completely understood and is a hot topic of debates. The deleterious mutations in plants have been solely predicted using sequence conservation methods rather than function-based classifiers due to lack of well-annotated mutational datasets in these organisms. Here, we developed a machine learning classifier based on a dataset of deleterious and neutral mutations in Arabidopsis thaliana by extracting 18 informative features that discriminate deleterious mutations from neutral, including 9 novel features not used in previous studies. We examined linear SVM, Gaussian SVM, and Random Forest classifiers, with the latter performing best. Random Forest classifiers exhibited a markedly higher accuracy than the popular PolyPhen-2 tool in the Arabidopsis dataset. Additionally, we tested whether the Random Forest, trained on the Arabidopsis dataset, accurately predicts deleterious mutations in Orýza sativa and Pisum sativum and observed satisfactory levels of performance accuracy (87% and 93%, respectively) higher than obtained by the PolyPhen-2. Application of Transfer learning in classifiers did not improve their performance. To additionally test the performance of the Random Forest classifier across different angiosperm species, we applied it to annotate deleterious mutations in Cicer arietinum and validated them using population frequency data. Overall, we devised a classifier with the potential to improve the annotation of putative functional mutations in QTL and GWAS hit regions, as well as for the evolutionary analysis of proliferation of deleterious mutations during plant domestication; thus optimizing breeding improvement and development of new cultivars. |
format | Online Article Text |
id | pubmed-6279870 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-62798702018-12-13 A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants Kovalev, Maxim S. Igolkina, Anna A. Samsonova, Maria G. Nuzhdin, Sergey V. Front Plant Sci Plant Science The impact of deleterious variation on both plant fitness and crop productivity is not completely understood and is a hot topic of debates. The deleterious mutations in plants have been solely predicted using sequence conservation methods rather than function-based classifiers due to lack of well-annotated mutational datasets in these organisms. Here, we developed a machine learning classifier based on a dataset of deleterious and neutral mutations in Arabidopsis thaliana by extracting 18 informative features that discriminate deleterious mutations from neutral, including 9 novel features not used in previous studies. We examined linear SVM, Gaussian SVM, and Random Forest classifiers, with the latter performing best. Random Forest classifiers exhibited a markedly higher accuracy than the popular PolyPhen-2 tool in the Arabidopsis dataset. Additionally, we tested whether the Random Forest, trained on the Arabidopsis dataset, accurately predicts deleterious mutations in Orýza sativa and Pisum sativum and observed satisfactory levels of performance accuracy (87% and 93%, respectively) higher than obtained by the PolyPhen-2. Application of Transfer learning in classifiers did not improve their performance. To additionally test the performance of the Random Forest classifier across different angiosperm species, we applied it to annotate deleterious mutations in Cicer arietinum and validated them using population frequency data. Overall, we devised a classifier with the potential to improve the annotation of putative functional mutations in QTL and GWAS hit regions, as well as for the evolutionary analysis of proliferation of deleterious mutations during plant domestication; thus optimizing breeding improvement and development of new cultivars. Frontiers Media S.A. 2018-11-28 /pmc/articles/PMC6279870/ /pubmed/30546376 http://dx.doi.org/10.3389/fpls.2018.01734 Text en Copyright © 2018 Kovalev, Igolkina, Samsonova and Nuzhdin. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Kovalev, Maxim S. Igolkina, Anna A. Samsonova, Maria G. Nuzhdin, Sergey V. A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title | A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title_full | A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title_fullStr | A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title_full_unstemmed | A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title_short | A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants |
title_sort | pipeline for classifying deleterious coding mutations in agricultural plants |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6279870/ https://www.ncbi.nlm.nih.gov/pubmed/30546376 http://dx.doi.org/10.3389/fpls.2018.01734 |
work_keys_str_mv | AT kovalevmaxims apipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT igolkinaannaa apipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT samsonovamariag apipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT nuzhdinsergeyv apipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT kovalevmaxims pipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT igolkinaannaa pipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT samsonovamariag pipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants AT nuzhdinsergeyv pipelineforclassifyingdeleteriouscodingmutationsinagriculturalplants |