Cargando…

Rapid DNA barcoding analysis of large datasets using the composition vector method

BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). H...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Ka Hou, Xu, Minli, Li, Chi Pang
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/
https://www.ncbi.nlm.nih.gov/pubmed/19900304
http://dx.doi.org/10.1186/1471-2105-10-S14-S8
_version_ 1782173992959868928
author Chu, Ka Hou
Xu, Minli
Li, Chi Pang
author_facet Chu, Ka Hou
Xu, Minli
Li, Chi Pang
author_sort Chu, Ka Hou
collection PubMed
description BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.
format Text
id pubmed-2775154
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27751542009-11-10 Rapid DNA barcoding analysis of large datasets using the composition vector method Chu, Ka Hou Xu, Minli Li, Chi Pang BMC Bioinformatics Research BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes. BioMed Central 2009-11-10 /pmc/articles/PMC2775154/ /pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8 Text en Copyright © 2009 Chu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chu, Ka Hou
Xu, Minli
Li, Chi Pang
Rapid DNA barcoding analysis of large datasets using the composition vector method
title Rapid DNA barcoding analysis of large datasets using the composition vector method
title_full Rapid DNA barcoding analysis of large datasets using the composition vector method
title_fullStr Rapid DNA barcoding analysis of large datasets using the composition vector method
title_full_unstemmed Rapid DNA barcoding analysis of large datasets using the composition vector method
title_short Rapid DNA barcoding analysis of large datasets using the composition vector method
title_sort rapid dna barcoding analysis of large datasets using the composition vector method
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/
https://www.ncbi.nlm.nih.gov/pubmed/19900304
http://dx.doi.org/10.1186/1471-2105-10-S14-S8
work_keys_str_mv AT chukahou rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod
AT xuminli rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod
AT lichipang rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod