Cargando…

A new efficient method for analyzing fungi species using correlations between nucleotides

BACKGROUND: In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xin, Tian, Kun, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307163/
https://www.ncbi.nlm.nih.gov/pubmed/30587116
http://dx.doi.org/10.1186/s12862-018-1330-y
_version_ 1783382943308185600
author Zhao, Xin
Tian, Kun
Yau, Stephen S.-T.
author_facet Zhao, Xin
Tian, Kun
Yau, Stephen S.-T.
author_sort Zhao, Xin
collection PubMed
description BACKGROUND: In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of increase of available data. RESULTS: In this study, we propose a new method for DNA barcode classification based on the distribution of nucleotides within the sequence. By adding the covariance of nucleotides to the original natural vector, this augmented 18-dimensional natural vector makes good use of the available information in the DNA sequence. The accurate classification results we obtained demonstrate that this new 18-dimensional natural vector method, together with the random forest classifier algorthm, can serve as a computationally efficient identification tool for DNA barcodes. We performed phylogenetic analysis on the genus Megacollybia to validate our method. We also studied how effective our method was in determining the genetic distance within and between species in our barcoding dataset. CONCLUSIONS: The classification performs well on the fungi barcode dataset with high and robust accuracy. The reasonable phylogenetic trees we obtained further validate our methods. This method is alignment-free and does not depend on any model assumption, and it will become a powerful tool for classification and evolutionary analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12862-018-1330-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6307163
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63071632019-01-02 A new efficient method for analyzing fungi species using correlations between nucleotides Zhao, Xin Tian, Kun Yau, Stephen S.-T. BMC Evol Biol Research Article BACKGROUND: In recent years, DNA barcoding has become an important tool for biologists to identify species and understand their natural biodiversity. The complexity of barcode data makes it difficult to analyze quickly and effectively. Manual classification of this data cannot keep up to the rate of increase of available data. RESULTS: In this study, we propose a new method for DNA barcode classification based on the distribution of nucleotides within the sequence. By adding the covariance of nucleotides to the original natural vector, this augmented 18-dimensional natural vector makes good use of the available information in the DNA sequence. The accurate classification results we obtained demonstrate that this new 18-dimensional natural vector method, together with the random forest classifier algorthm, can serve as a computationally efficient identification tool for DNA barcodes. We performed phylogenetic analysis on the genus Megacollybia to validate our method. We also studied how effective our method was in determining the genetic distance within and between species in our barcoding dataset. CONCLUSIONS: The classification performs well on the fungi barcode dataset with high and robust accuracy. The reasonable phylogenetic trees we obtained further validate our methods. This method is alignment-free and does not depend on any model assumption, and it will become a powerful tool for classification and evolutionary analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12862-018-1330-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-27 /pmc/articles/PMC6307163/ /pubmed/30587116 http://dx.doi.org/10.1186/s12862-018-1330-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhao, Xin
Tian, Kun
Yau, Stephen S.-T.
A new efficient method for analyzing fungi species using correlations between nucleotides
title A new efficient method for analyzing fungi species using correlations between nucleotides
title_full A new efficient method for analyzing fungi species using correlations between nucleotides
title_fullStr A new efficient method for analyzing fungi species using correlations between nucleotides
title_full_unstemmed A new efficient method for analyzing fungi species using correlations between nucleotides
title_short A new efficient method for analyzing fungi species using correlations between nucleotides
title_sort new efficient method for analyzing fungi species using correlations between nucleotides
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307163/
https://www.ncbi.nlm.nih.gov/pubmed/30587116
http://dx.doi.org/10.1186/s12862-018-1330-y
work_keys_str_mv AT zhaoxin anewefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides
AT tiankun anewefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides
AT yaustephenst anewefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides
AT zhaoxin newefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides
AT tiankun newefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides
AT yaustephenst newefficientmethodforanalyzingfungispeciesusingcorrelationsbetweennucleotides