Cargando…

SAMStat: monitoring biases in next generation sequencing data

Motivation: The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lassmann, Timo, Hayashizaki, Yoshihide, Daub, Carsten O.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3008642/ https://www.ncbi.nlm.nih.gov/pubmed/21088025 http://dx.doi.org/10.1093/bioinformatics/btq614

_version_	1782194529312440320
author	Lassmann, Timo Hayashizaki, Yoshihide Daub, Carsten O.
author_facet	Lassmann, Timo Hayashizaki, Yoshihide Daub, Carsten O.
author_sort	Lassmann, Timo
collection	PubMed
description	Motivation: The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time. Results: We demonstrate that studying sequence features in mapped data can be used to identify biases particular to one sequencing protocol. Once identified, such biases can be considered in the downstream analysis or even be removed by read trimming or filtering techniques. Availability: SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net. Contact: timolassmann@gmail.com
format	Text
id	pubmed-3008642
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-30086422010-12-29 SAMStat: monitoring biases in next generation sequencing data Lassmann, Timo Hayashizaki, Yoshihide Daub, Carsten O. Bioinformatics Applications Note Motivation: The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time. Results: We demonstrate that studying sequence features in mapped data can be used to identify biases particular to one sequencing protocol. Once identified, such biases can be considered in the downstream analysis or even be removed by read trimming or filtering techniques. Availability: SAMStat is open source and freely available as a C program running on all Unix-compatible platforms. The source code is available from http://samstat.sourceforge.net. Contact: timolassmann@gmail.com Oxford University Press 2011-01-01 2010-11-18 /pmc/articles/PMC3008642/ /pubmed/21088025 http://dx.doi.org/10.1093/bioinformatics/btq614 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Note Lassmann, Timo Hayashizaki, Yoshihide Daub, Carsten O. SAMStat: monitoring biases in next generation sequencing data
title	SAMStat: monitoring biases in next generation sequencing data
title_full	SAMStat: monitoring biases in next generation sequencing data
title_fullStr	SAMStat: monitoring biases in next generation sequencing data
title_full_unstemmed	SAMStat: monitoring biases in next generation sequencing data
title_short	SAMStat: monitoring biases in next generation sequencing data
title_sort	samstat: monitoring biases in next generation sequencing data
topic	Applications Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3008642/ https://www.ncbi.nlm.nih.gov/pubmed/21088025 http://dx.doi.org/10.1093/bioinformatics/btq614
work_keys_str_mv	AT lassmanntimo samstatmonitoringbiasesinnextgenerationsequencingdata AT hayashizakiyoshihide samstatmonitoringbiasesinnextgenerationsequencingdata AT daubcarsteno samstatmonitoringbiasesinnextgenerationsequencingdata

SAMStat: monitoring biases in next generation sequencing data

Ejemplares similares