Cargando…

ANDES: Statistical tools for the ANalyses of DEep Sequencing

BACKGROUND: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same s...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Kelvin, Venter, Eli, Yooseph, Shibu, Stockwell, Timothy B, Eckerle, Lance D, Denison, Mark R, Spiro, David J, Methé, Barbara A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921379/
https://www.ncbi.nlm.nih.gov/pubmed/20633290
http://dx.doi.org/10.1186/1756-0500-3-199
_version_ 1782185383557070848
author Li, Kelvin
Venter, Eli
Yooseph, Shibu
Stockwell, Timothy B
Eckerle, Lance D
Denison, Mark R
Spiro, David J
Methé, Barbara A
author_facet Li, Kelvin
Venter, Eli
Yooseph, Shibu
Stockwell, Timothy B
Eckerle, Lance D
Denison, Mark R
Spiro, David J
Methé, Barbara A
author_sort Li, Kelvin
collection PubMed
description BACKGROUND: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep. FINDINGS: We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values. CONCLUSIONS: As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at https://sourceforge.net/projects/andestools.
format Text
id pubmed-2921379
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29213792010-08-16 ANDES: Statistical tools for the ANalyses of DEep Sequencing Li, Kelvin Venter, Eli Yooseph, Shibu Stockwell, Timothy B Eckerle, Lance D Denison, Mark R Spiro, David J Methé, Barbara A BMC Res Notes Technical Note BACKGROUND: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep. FINDINGS: We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values. CONCLUSIONS: As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at https://sourceforge.net/projects/andestools. BioMed Central 2010-07-15 /pmc/articles/PMC2921379/ /pubmed/20633290 http://dx.doi.org/10.1186/1756-0500-3-199 Text en Copyright ©2010 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Li, Kelvin
Venter, Eli
Yooseph, Shibu
Stockwell, Timothy B
Eckerle, Lance D
Denison, Mark R
Spiro, David J
Methé, Barbara A
ANDES: Statistical tools for the ANalyses of DEep Sequencing
title ANDES: Statistical tools for the ANalyses of DEep Sequencing
title_full ANDES: Statistical tools for the ANalyses of DEep Sequencing
title_fullStr ANDES: Statistical tools for the ANalyses of DEep Sequencing
title_full_unstemmed ANDES: Statistical tools for the ANalyses of DEep Sequencing
title_short ANDES: Statistical tools for the ANalyses of DEep Sequencing
title_sort andes: statistical tools for the analyses of deep sequencing
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2921379/
https://www.ncbi.nlm.nih.gov/pubmed/20633290
http://dx.doi.org/10.1186/1756-0500-3-199
work_keys_str_mv AT likelvin andesstatisticaltoolsfortheanalysesofdeepsequencing
AT ventereli andesstatisticaltoolsfortheanalysesofdeepsequencing
AT yoosephshibu andesstatisticaltoolsfortheanalysesofdeepsequencing
AT stockwelltimothyb andesstatisticaltoolsfortheanalysesofdeepsequencing
AT eckerlelanced andesstatisticaltoolsfortheanalysesofdeepsequencing
AT denisonmarkr andesstatisticaltoolsfortheanalysesofdeepsequencing
AT spirodavidj andesstatisticaltoolsfortheanalysesofdeepsequencing
AT methebarbaraa andesstatisticaltoolsfortheanalysesofdeepsequencing