Cargando…

TagDust2: a generic method to extract reads from sequencing data

BACKGROUND: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial...

Descripción completa

Detalles Bibliográficos
Autor principal:	Lassmann, Timo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384298/ https://www.ncbi.nlm.nih.gov/pubmed/25627334 http://dx.doi.org/10.1186/s12859-015-0454-y

_version_	1782364880526901248
author	Lassmann, Timo
author_facet	Lassmann, Timo
author_sort	Lassmann, Timo
collection	PubMed
description	BACKGROUND: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial. RESULTS: Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection. CONCLUSION: Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0454-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4384298
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43842982015-04-04 TagDust2: a generic method to extract reads from sequencing data Lassmann, Timo BMC Bioinformatics Software BACKGROUND: Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial. RESULTS: Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection. CONCLUSION: Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0454-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-28 /pmc/articles/PMC4384298/ /pubmed/25627334 http://dx.doi.org/10.1186/s12859-015-0454-y Text en © Lassmann; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Lassmann, Timo TagDust2: a generic method to extract reads from sequencing data
title	TagDust2: a generic method to extract reads from sequencing data
title_full	TagDust2: a generic method to extract reads from sequencing data
title_fullStr	TagDust2: a generic method to extract reads from sequencing data
title_full_unstemmed	TagDust2: a generic method to extract reads from sequencing data
title_short	TagDust2: a generic method to extract reads from sequencing data
title_sort	tagdust2: a generic method to extract reads from sequencing data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384298/ https://www.ncbi.nlm.nih.gov/pubmed/25627334 http://dx.doi.org/10.1186/s12859-015-0454-y
work_keys_str_mv	AT lassmanntimo tagdust2agenericmethodtoextractreadsfromsequencingdata

TagDust2: a generic method to extract reads from sequencing data

Ejemplares similares