Cargando…

TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data

BACKGROUND: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in format...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clark, Lindsay V., Sacks, Erik J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4940913/ https://www.ncbi.nlm.nih.gov/pubmed/27408618 http://dx.doi.org/10.1186/s13029-016-0057-7

_version_	1782442219664310272
author	Clark, Lindsay V. Sacks, Erik J.
author_facet	Clark, Lindsay V. Sacks, Erik J.
author_sort	Clark, Lindsay V.
collection	PubMed
description	BACKGROUND: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult. RESULTS: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files. CONCLUSIONS: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0057-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4940913
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49409132016-07-13 TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data Clark, Lindsay V. Sacks, Erik J. Source Code Biol Med Software BACKGROUND: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult. RESULTS: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files. CONCLUSIONS: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0057-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-07-11 /pmc/articles/PMC4940913/ /pubmed/27408618 http://dx.doi.org/10.1186/s13029-016-0057-7 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Clark, Lindsay V. Sacks, Erik J. TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title	TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title_full	TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title_fullStr	TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title_full_unstemmed	TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title_short	TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
title_sort	tagdigger: user-friendly extraction of read counts from gbs and rad-seq data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4940913/ https://www.ncbi.nlm.nih.gov/pubmed/27408618 http://dx.doi.org/10.1186/s13029-016-0057-7
work_keys_str_mv	AT clarklindsayv tagdiggeruserfriendlyextractionofreadcountsfromgbsandradseqdata AT sackserikj tagdiggeruserfriendlyextractionofreadcountsfromgbsandradseqdata

TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data

Ejemplares similares