Cargando…

Rapid DNA barcoding analysis of large datasets using the composition vector method

BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). H...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chu, Ka Hou, Xu, Minli, Li, Chi Pang
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/ https://www.ncbi.nlm.nih.gov/pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8

_version_	1782173992959868928
author	Chu, Ka Hou Xu, Minli Li, Chi Pang
author_facet	Chu, Ka Hou Xu, Minli Li, Chi Pang
author_sort	Chu, Ka Hou
collection	PubMed
description	BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.
format	Text
id	pubmed-2775154
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27751542009-11-10 Rapid DNA barcoding analysis of large datasets using the composition vector method Chu, Ka Hou Xu, Minli Li, Chi Pang BMC Bioinformatics Research BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes. BioMed Central 2009-11-10 /pmc/articles/PMC2775154/ /pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8 Text en Copyright © 2009 Chu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Chu, Ka Hou Xu, Minli Li, Chi Pang Rapid DNA barcoding analysis of large datasets using the composition vector method
title	Rapid DNA barcoding analysis of large datasets using the composition vector method
title_full	Rapid DNA barcoding analysis of large datasets using the composition vector method
title_fullStr	Rapid DNA barcoding analysis of large datasets using the composition vector method
title_full_unstemmed	Rapid DNA barcoding analysis of large datasets using the composition vector method
title_short	Rapid DNA barcoding analysis of large datasets using the composition vector method
title_sort	rapid dna barcoding analysis of large datasets using the composition vector method
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/ https://www.ncbi.nlm.nih.gov/pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8
work_keys_str_mv	AT chukahou rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod AT xuminli rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod AT lichipang rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod

Rapid DNA barcoding analysis of large datasets using the composition vector method

Ejemplares similares