Cargando…
Rapid DNA barcoding analysis of large datasets using the composition vector method
BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). H...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/ https://www.ncbi.nlm.nih.gov/pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8 |
_version_ | 1782173992959868928 |
---|---|
author | Chu, Ka Hou Xu, Minli Li, Chi Pang |
author_facet | Chu, Ka Hou Xu, Minli Li, Chi Pang |
author_sort | Chu, Ka Hou |
collection | PubMed |
description | BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes. |
format | Text |
id | pubmed-2775154 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27751542009-11-10 Rapid DNA barcoding analysis of large datasets using the composition vector method Chu, Ka Hou Xu, Minli Li, Chi Pang BMC Bioinformatics Research BACKGROUND: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. RESULTS: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. CONCLUSION: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes. BioMed Central 2009-11-10 /pmc/articles/PMC2775154/ /pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8 Text en Copyright © 2009 Chu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Chu, Ka Hou Xu, Minli Li, Chi Pang Rapid DNA barcoding analysis of large datasets using the composition vector method |
title | Rapid DNA barcoding analysis of large datasets using the composition vector method |
title_full | Rapid DNA barcoding analysis of large datasets using the composition vector method |
title_fullStr | Rapid DNA barcoding analysis of large datasets using the composition vector method |
title_full_unstemmed | Rapid DNA barcoding analysis of large datasets using the composition vector method |
title_short | Rapid DNA barcoding analysis of large datasets using the composition vector method |
title_sort | rapid dna barcoding analysis of large datasets using the composition vector method |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775154/ https://www.ncbi.nlm.nih.gov/pubmed/19900304 http://dx.doi.org/10.1186/1471-2105-10-S14-S8 |
work_keys_str_mv | AT chukahou rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod AT xuminli rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod AT lichipang rapiddnabarcodinganalysisoflargedatasetsusingthecompositionvectormethod |