Cargando…

A Scalable Method for Analysis and Display of DNA Sequences

BACKGROUND: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may ob...

Descripción completa

Detalles Bibliográficos
Autores principales: Sirovich, Lawrence, Stoeckle, Mark Y., Zhang, Yu
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2749217/
https://www.ncbi.nlm.nih.gov/pubmed/19798412
http://dx.doi.org/10.1371/journal.pone.0007051
_version_ 1782172175318384640
author Sirovich, Lawrence
Stoeckle, Mark Y.
Zhang, Yu
author_facet Sirovich, Lawrence
Stoeckle, Mark Y.
Zhang, Yu
author_sort Sirovich, Lawrence
collection PubMed
description BACKGROUND: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable. METHODOLOGY/PRINCIPAL FINDINGS: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy. CONCLUSIONS/SIGNIFICANCE: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data.
format Text
id pubmed-2749217
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27492172009-10-02 A Scalable Method for Analysis and Display of DNA Sequences Sirovich, Lawrence Stoeckle, Mark Y. Zhang, Yu PLoS One Research Article BACKGROUND: Comparative DNA sequence analysis provides insight into evolution and helps construct a natural classification reflecting the Tree of Life. The growing numbers of organisms represented in DNA databases challenge tree-building techniques and the vertical hierarchical classification may obscure relationships among some groups. Approaches that can incorporate sequence data from large numbers of taxa and enable visualization of affinities across groups are desirable. METHODOLOGY/PRINCIPAL FINDINGS: Toward this end, we developed a procedure for extracting diagnostic patterns in the form of indicator vectors from DNA sequences of taxonomic groups. In the present instance the indicator vectors were derived from mitochondrial cytochrome c oxidase I (COI) sequences of those groups and further analyzed on this basis. In the first example, indicator vectors for birds, fish, and butterflies were constructed from a training set of COI sequences, then correlations with test sequences not used to construct the indicator vector were determined. In all cases, correlation with the indicator vector correctly assigned test sequences to their proper group. In the second example, this approach was explored at the species level within the bird grouping; this also gave correct assignment, suggesting the possibility of automated procedures for classification at various taxonomic levels. A false-color matrix of vector correlations displayed affinities among species consistent with higher-order taxonomy. CONCLUSIONS/SIGNIFICANCE: The indicator vectors preserved DNA character information and provided quantitative measures of correlations among taxonomic groups. This method is scalable to the largest datasets envisioned in this field, provides a visually-intuitive display that captures relational affinities derived from sequence data across a diversity of life forms, and is potentially a useful complement to current tree-building techniques for studying evolutionary processes based on DNA sequence data. Public Library of Science 2009-10-02 /pmc/articles/PMC2749217/ /pubmed/19798412 http://dx.doi.org/10.1371/journal.pone.0007051 Text en Sirovich et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sirovich, Lawrence
Stoeckle, Mark Y.
Zhang, Yu
A Scalable Method for Analysis and Display of DNA Sequences
title A Scalable Method for Analysis and Display of DNA Sequences
title_full A Scalable Method for Analysis and Display of DNA Sequences
title_fullStr A Scalable Method for Analysis and Display of DNA Sequences
title_full_unstemmed A Scalable Method for Analysis and Display of DNA Sequences
title_short A Scalable Method for Analysis and Display of DNA Sequences
title_sort scalable method for analysis and display of dna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2749217/
https://www.ncbi.nlm.nih.gov/pubmed/19798412
http://dx.doi.org/10.1371/journal.pone.0007051
work_keys_str_mv AT sirovichlawrence ascalablemethodforanalysisanddisplayofdnasequences
AT stoecklemarky ascalablemethodforanalysisanddisplayofdnasequences
AT zhangyu ascalablemethodforanalysisanddisplayofdnasequences
AT sirovichlawrence scalablemethodforanalysisanddisplayofdnasequences
AT stoecklemarky scalablemethodforanalysisanddisplayofdnasequences
AT zhangyu scalablemethodforanalysisanddisplayofdnasequences