Cargando…

Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine

BACKGROUND: Begomoviruses are widely distributed and causing devastating diseases in many crops. According to the number of genomic components, a begomovirus is known as either monopartite or bipartite begomovirus. Both the monopartite and bipartite begomoviruses have the DNA-A component which encod...

Descripción completa

Detalles Bibliográficos
Autores principales: Pei, Shaojun, Dong, Rui, Bao, Yiming, He, Rong Lucy, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409808/
https://www.ncbi.nlm.nih.gov/pubmed/32832270
http://dx.doi.org/10.7717/peerj.9625
_version_ 1783568129942618112
author Pei, Shaojun
Dong, Rui
Bao, Yiming
He, Rong Lucy
Yau, Stephen S.-T.
author_facet Pei, Shaojun
Dong, Rui
Bao, Yiming
He, Rong Lucy
Yau, Stephen S.-T.
author_sort Pei, Shaojun
collection PubMed
description BACKGROUND: Begomoviruses are widely distributed and causing devastating diseases in many crops. According to the number of genomic components, a begomovirus is known as either monopartite or bipartite begomovirus. Both the monopartite and bipartite begomoviruses have the DNA-A component which encodes all essential proteins for virus functions, while the bipartite begomoviruses still contain the DNA-B component. The satellite molecules, known as betasatellites, alphasatellites or deltasatellites, sometimes exist in the begomoviruses. So, the genomic components of begomoviruses are complex and varied. Different genomic components have different gene structures and functions. Classifying the components of begomoviruses is important for studying the virus origin and pathogenic mechanism. METHODS: We propose a model combining Subsequence Natural Vector (SNV) method with Support Vector Machine (SVM) algorithm, to classify the genomic components of begomoviruses and predict the genes of begomoviruses. First, the genome sequence is represented as a vector numerically by the SNV method. Then SVM is applied on the datasets to build the classification model. At last, recursive feature elimination (RFE) is used to select essential features of the subsequence natural vectors based on the importance of features. RESULTS: In the investigation, DNA-A, DNA-B, and different satellite DNAs are selected to build the model. To evaluate our model, the homology-based method BLAST and two machine learning algorithms Random Forest and Naive Bayes method are used to compare with our model. According to the results, our classification model can classify DNA-A, DNA-B, and different satellites with high accuracy. Especially, we can distinguish whether a DNA-A component is from a monopartite or a bipartite begomovirus. Then, based on the results of classification, we can also predict the genes of different genomic components. According to the selected features, we find that the content of four nucleotides in the second and tenth segments (approximately 150-350 bp and 1,450–1,650 bp) are the most different between DNA-A components of monopartite and bipartite begomoviruses, which may be related to the pre-coat protein (AV2) and the transcriptional activator protein (AC2) genes. Our results advance the understanding of the unique structures of the genomic components of begomoviruses.
format Online
Article
Text
id pubmed-7409808
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-74098082020-08-21 Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine Pei, Shaojun Dong, Rui Bao, Yiming He, Rong Lucy Yau, Stephen S.-T. PeerJ Bioinformatics BACKGROUND: Begomoviruses are widely distributed and causing devastating diseases in many crops. According to the number of genomic components, a begomovirus is known as either monopartite or bipartite begomovirus. Both the monopartite and bipartite begomoviruses have the DNA-A component which encodes all essential proteins for virus functions, while the bipartite begomoviruses still contain the DNA-B component. The satellite molecules, known as betasatellites, alphasatellites or deltasatellites, sometimes exist in the begomoviruses. So, the genomic components of begomoviruses are complex and varied. Different genomic components have different gene structures and functions. Classifying the components of begomoviruses is important for studying the virus origin and pathogenic mechanism. METHODS: We propose a model combining Subsequence Natural Vector (SNV) method with Support Vector Machine (SVM) algorithm, to classify the genomic components of begomoviruses and predict the genes of begomoviruses. First, the genome sequence is represented as a vector numerically by the SNV method. Then SVM is applied on the datasets to build the classification model. At last, recursive feature elimination (RFE) is used to select essential features of the subsequence natural vectors based on the importance of features. RESULTS: In the investigation, DNA-A, DNA-B, and different satellite DNAs are selected to build the model. To evaluate our model, the homology-based method BLAST and two machine learning algorithms Random Forest and Naive Bayes method are used to compare with our model. According to the results, our classification model can classify DNA-A, DNA-B, and different satellites with high accuracy. Especially, we can distinguish whether a DNA-A component is from a monopartite or a bipartite begomovirus. Then, based on the results of classification, we can also predict the genes of different genomic components. According to the selected features, we find that the content of four nucleotides in the second and tenth segments (approximately 150-350 bp and 1,450–1,650 bp) are the most different between DNA-A components of monopartite and bipartite begomoviruses, which may be related to the pre-coat protein (AV2) and the transcriptional activator protein (AC2) genes. Our results advance the understanding of the unique structures of the genomic components of begomoviruses. PeerJ Inc. 2020-08-03 /pmc/articles/PMC7409808/ /pubmed/32832270 http://dx.doi.org/10.7717/peerj.9625 Text en ©2020 Pei et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Pei, Shaojun
Dong, Rui
Bao, Yiming
He, Rong Lucy
Yau, Stephen S.-T.
Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title_full Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title_fullStr Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title_full_unstemmed Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title_short Classification of genomic components and prediction of genes of Begomovirus based on subsequence natural vector and support vector machine
title_sort classification of genomic components and prediction of genes of begomovirus based on subsequence natural vector and support vector machine
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409808/
https://www.ncbi.nlm.nih.gov/pubmed/32832270
http://dx.doi.org/10.7717/peerj.9625
work_keys_str_mv AT peishaojun classificationofgenomiccomponentsandpredictionofgenesofbegomovirusbasedonsubsequencenaturalvectorandsupportvectormachine
AT dongrui classificationofgenomiccomponentsandpredictionofgenesofbegomovirusbasedonsubsequencenaturalvectorandsupportvectormachine
AT baoyiming classificationofgenomiccomponentsandpredictionofgenesofbegomovirusbasedonsubsequencenaturalvectorandsupportvectormachine
AT heronglucy classificationofgenomiccomponentsandpredictionofgenesofbegomovirusbasedonsubsequencenaturalvectorandsupportvectormachine
AT yaustephenst classificationofgenomiccomponentsandpredictionofgenesofbegomovirusbasedonsubsequencenaturalvectorandsupportvectormachine