Cargando…

Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction

Tobacco mosaic virus, TMV for short, is widely distributed in the global tobacco industry and has a significant impact on tobacco production. It can reduce the amount of tobacco grown by 50–70%. In this research of study, we aimed to identify tobacco mosaic virus proteins and healthy tobacco leaf pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yu-Miao, Zu, Xin-Ping, Li, Dan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7581905/
https://www.ncbi.nlm.nih.gov/pubmed/33193664
http://dx.doi.org/10.3389/fgene.2020.569100
_version_ 1783599074098806784
author Chen, Yu-Miao
Zu, Xin-Ping
Li, Dan
author_facet Chen, Yu-Miao
Zu, Xin-Ping
Li, Dan
author_sort Chen, Yu-Miao
collection PubMed
description Tobacco mosaic virus, TMV for short, is widely distributed in the global tobacco industry and has a significant impact on tobacco production. It can reduce the amount of tobacco grown by 50–70%. In this research of study, we aimed to identify tobacco mosaic virus proteins and healthy tobacco leaf proteins by using machine learning approaches. The experiment's results showed that the support vector machine algorithm achieved high accuracy in different feature extraction methods. And 188-dimensions feature extraction method improved the classification accuracy. In that the support vector machine algorithm and 188-dimensions feature extraction method were finally selected as the final experimental methods. In the 10-fold cross-validation processes, the SVM combined with 188-dimensions achieved 93.5% accuracy on the training set and 92.7% accuracy on the independent validation set. Besides, the evaluation index of the results of experiments indicate that the method developed by us is valid and robust.
format Online
Article
Text
id pubmed-7581905
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-75819052020-11-13 Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction Chen, Yu-Miao Zu, Xin-Ping Li, Dan Front Genet Genetics Tobacco mosaic virus, TMV for short, is widely distributed in the global tobacco industry and has a significant impact on tobacco production. It can reduce the amount of tobacco grown by 50–70%. In this research of study, we aimed to identify tobacco mosaic virus proteins and healthy tobacco leaf proteins by using machine learning approaches. The experiment's results showed that the support vector machine algorithm achieved high accuracy in different feature extraction methods. And 188-dimensions feature extraction method improved the classification accuracy. In that the support vector machine algorithm and 188-dimensions feature extraction method were finally selected as the final experimental methods. In the 10-fold cross-validation processes, the SVM combined with 188-dimensions achieved 93.5% accuracy on the training set and 92.7% accuracy on the independent validation set. Besides, the evaluation index of the results of experiments indicate that the method developed by us is valid and robust. Frontiers Media S.A. 2020-10-09 /pmc/articles/PMC7581905/ /pubmed/33193664 http://dx.doi.org/10.3389/fgene.2020.569100 Text en Copyright © 2020 Chen, Zu and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chen, Yu-Miao
Zu, Xin-Ping
Li, Dan
Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title_full Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title_fullStr Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title_full_unstemmed Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title_short Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction
title_sort identification of proteins of tobacco mosaic virus by using a method of feature extraction
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7581905/
https://www.ncbi.nlm.nih.gov/pubmed/33193664
http://dx.doi.org/10.3389/fgene.2020.569100
work_keys_str_mv AT chenyumiao identificationofproteinsoftobaccomosaicvirusbyusingamethodoffeatureextraction
AT zuxinping identificationofproteinsoftobaccomosaicvirusbyusingamethodoffeatureextraction
AT lidan identificationofproteinsoftobaccomosaicvirusbyusingamethodoffeatureextraction