Cargando…

SEWAL: an open-source platform for next-generation sequence analysis and visualization

Next-generation DNA sequencing platforms provide exciting new possibilities for in vitro genetic analysis of functional nucleic acids. However, the size of the resulting data sets presents computational and analytical challenges. We present an open-source software package that employs a locality-sen...

Descripción completa

Detalles Bibliográficos
Autores principales: Pitt, Jason N., Rajapakse, Indika, Ferré-D’Amaré, Adrian R.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001052/
https://www.ncbi.nlm.nih.gov/pubmed/20693400
http://dx.doi.org/10.1093/nar/gkq661
_version_ 1782193586892177408
author Pitt, Jason N.
Rajapakse, Indika
Ferré-D’Amaré, Adrian R.
author_facet Pitt, Jason N.
Rajapakse, Indika
Ferré-D’Amaré, Adrian R.
author_sort Pitt, Jason N.
collection PubMed
description Next-generation DNA sequencing platforms provide exciting new possibilities for in vitro genetic analysis of functional nucleic acids. However, the size of the resulting data sets presents computational and analytical challenges. We present an open-source software package that employs a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run (∼10(8) sequences). The algorithm results in quasilinear time processing of entire Illumina lanes (∼10(7) sequences) on a desktop computer in minutes. To facilitate visual analysis of sequencing data, the software produces three-dimensional scatter plots similar in concept to Sewall Wright and John Maynard Smith’s adaptive or fitness landscape. The software also contains functions that are particularly useful for doped selections such as mutation frequency analysis, information content calculation, multivariate statistical functions (including principal component analysis), sequence distance metrics, sequence searches and sequence comparisons across multiple Illumina data sets. Source code, executable files and links to sample data sets are available at http://www.sourceforge.net/projects/sewal.
format Text
id pubmed-3001052
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-30010522010-12-13 SEWAL: an open-source platform for next-generation sequence analysis and visualization Pitt, Jason N. Rajapakse, Indika Ferré-D’Amaré, Adrian R. Nucleic Acids Res Computational Biology Next-generation DNA sequencing platforms provide exciting new possibilities for in vitro genetic analysis of functional nucleic acids. However, the size of the resulting data sets presents computational and analytical challenges. We present an open-source software package that employs a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run (∼10(8) sequences). The algorithm results in quasilinear time processing of entire Illumina lanes (∼10(7) sequences) on a desktop computer in minutes. To facilitate visual analysis of sequencing data, the software produces three-dimensional scatter plots similar in concept to Sewall Wright and John Maynard Smith’s adaptive or fitness landscape. The software also contains functions that are particularly useful for doped selections such as mutation frequency analysis, information content calculation, multivariate statistical functions (including principal component analysis), sequence distance metrics, sequence searches and sequence comparisons across multiple Illumina data sets. Source code, executable files and links to sample data sets are available at http://www.sourceforge.net/projects/sewal. Oxford University Press 2010-12 2010-08-06 /pmc/articles/PMC3001052/ /pubmed/20693400 http://dx.doi.org/10.1093/nar/gkq661 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Pitt, Jason N.
Rajapakse, Indika
Ferré-D’Amaré, Adrian R.
SEWAL: an open-source platform for next-generation sequence analysis and visualization
title SEWAL: an open-source platform for next-generation sequence analysis and visualization
title_full SEWAL: an open-source platform for next-generation sequence analysis and visualization
title_fullStr SEWAL: an open-source platform for next-generation sequence analysis and visualization
title_full_unstemmed SEWAL: an open-source platform for next-generation sequence analysis and visualization
title_short SEWAL: an open-source platform for next-generation sequence analysis and visualization
title_sort sewal: an open-source platform for next-generation sequence analysis and visualization
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001052/
https://www.ncbi.nlm.nih.gov/pubmed/20693400
http://dx.doi.org/10.1093/nar/gkq661
work_keys_str_mv AT pittjasonn sewalanopensourceplatformfornextgenerationsequenceanalysisandvisualization
AT rajapakseindika sewalanopensourceplatformfornextgenerationsequenceanalysisandvisualization
AT ferredamareadrianr sewalanopensourceplatformfornextgenerationsequenceanalysisandvisualization