Cargando…

Efficient alignment-free DNA barcode analytics

BACKGROUND: In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuksa, Pavel, Pavlovic, Vladimir
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775155/
https://www.ncbi.nlm.nih.gov/pubmed/19900305
http://dx.doi.org/10.1186/1471-2105-10-S14-S9
_version_ 1782173993197895680
author Kuksa, Pavel
Pavlovic, Vladimir
author_facet Kuksa, Pavel
Pavlovic, Vladimir
author_sort Kuksa, Pavel
collection PubMed
description BACKGROUND: In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. RESULTS: New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. CONCLUSION: Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.
format Text
id pubmed-2775155
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27751552009-11-10 Efficient alignment-free DNA barcode analytics Kuksa, Pavel Pavlovic, Vladimir BMC Bioinformatics Research BACKGROUND: In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. RESULTS: New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. CONCLUSION: Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. BioMed Central 2009-11-10 /pmc/articles/PMC2775155/ /pubmed/19900305 http://dx.doi.org/10.1186/1471-2105-10-S14-S9 Text en Copyright © 2009 Kuksa and Pavlovic; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kuksa, Pavel
Pavlovic, Vladimir
Efficient alignment-free DNA barcode analytics
title Efficient alignment-free DNA barcode analytics
title_full Efficient alignment-free DNA barcode analytics
title_fullStr Efficient alignment-free DNA barcode analytics
title_full_unstemmed Efficient alignment-free DNA barcode analytics
title_short Efficient alignment-free DNA barcode analytics
title_sort efficient alignment-free dna barcode analytics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775155/
https://www.ncbi.nlm.nih.gov/pubmed/19900305
http://dx.doi.org/10.1186/1471-2105-10-S14-S9
work_keys_str_mv AT kuksapavel efficientalignmentfreednabarcodeanalytics
AT pavlovicvladimir efficientalignmentfreednabarcodeanalytics