Cargando…

TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline

Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is...

Descripción completa

Detalles Bibliográficos
Autores principales: Glaubitz, Jeffrey C., Casstevens, Terry M., Lu, Fei, Harriman, James, Elshire, Robert J., Sun, Qi, Buckler, Edward S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938676/
https://www.ncbi.nlm.nih.gov/pubmed/24587335
http://dx.doi.org/10.1371/journal.pone.0090346
_version_ 1782305636860559360
author Glaubitz, Jeffrey C.
Casstevens, Terry M.
Lu, Fei
Harriman, James
Elshire, Robert J.
Sun, Qi
Buckler, Edward S.
author_facet Glaubitz, Jeffrey C.
Casstevens, Terry M.
Lu, Fei
Harriman, James
Elshire, Robert J.
Sun, Qi
Buckler, Edward S.
author_sort Glaubitz, Jeffrey C.
collection PubMed
description Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, tassel-gbs, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The tassel-gbs pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8–16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished “pseudo-reference” consisting of numerous contigs. We describe the tassel-gbs pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays), where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the tassel-gbs pipeline provide robust tools for studying genomic diversity.
format Online
Article
Text
id pubmed-3938676
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39386762014-03-04 TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline Glaubitz, Jeffrey C. Casstevens, Terry M. Lu, Fei Harriman, James Elshire, Robert J. Sun, Qi Buckler, Edward S. PLoS One Research Article Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, tassel-gbs, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The tassel-gbs pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8–16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished “pseudo-reference” consisting of numerous contigs. We describe the tassel-gbs pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays), where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the tassel-gbs pipeline provide robust tools for studying genomic diversity. Public Library of Science 2014-02-28 /pmc/articles/PMC3938676/ /pubmed/24587335 http://dx.doi.org/10.1371/journal.pone.0090346 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Glaubitz, Jeffrey C.
Casstevens, Terry M.
Lu, Fei
Harriman, James
Elshire, Robert J.
Sun, Qi
Buckler, Edward S.
TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title_full TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title_fullStr TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title_full_unstemmed TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title_short TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline
title_sort tassel-gbs: a high capacity genotyping by sequencing analysis pipeline
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938676/
https://www.ncbi.nlm.nih.gov/pubmed/24587335
http://dx.doi.org/10.1371/journal.pone.0090346
work_keys_str_mv AT glaubitzjeffreyc tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT casstevensterrym tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT lufei tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT harrimanjames tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT elshirerobertj tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT sunqi tasselgbsahighcapacitygenotypingbysequencinganalysispipeline
AT buckleredwards tasselgbsahighcapacitygenotypingbysequencinganalysispipeline