Cargando…

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wickland, Daniel P., Battu, Gopal, Hudson, Karen A., Diers, Brian W., Hudson, Matthew E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5745977/ https://www.ncbi.nlm.nih.gov/pubmed/29281959 http://dx.doi.org/10.1186/s12859-017-2000-6

_version_	1783289016342282240
author	Wickland, Daniel P. Battu, Gopal Hudson, Karen A. Diers, Brian W. Hudson, Matthew E.
author_facet	Wickland, Daniel P. Battu, Gopal Hudson, Karen A. Diers, Brian W. Hudson, Matthew E.
author_sort	Wickland, Daniel P.
collection	PubMed
description	BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. RESULTS: We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. CONCLUSIONS: We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.
format	Online Article Text
id	pubmed-5745977
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57459772018-01-03 A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy Wickland, Daniel P. Battu, Gopal Hudson, Karen A. Diers, Brian W. Hudson, Matthew E. BMC Bioinformatics Methodology Article BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. RESULTS: We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. CONCLUSIONS: We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data. BioMed Central 2017-12-28 /pmc/articles/PMC5745977/ /pubmed/29281959 http://dx.doi.org/10.1186/s12859-017-2000-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Wickland, Daniel P. Battu, Gopal Hudson, Karen A. Diers, Brian W. Hudson, Matthew E. A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_fullStr	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full_unstemmed	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_short	A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_sort	comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, gb-easy
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5745977/ https://www.ncbi.nlm.nih.gov/pubmed/29281959 http://dx.doi.org/10.1186/s12859-017-2000-6
work_keys_str_mv	AT wicklanddanielp acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT battugopal acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT hudsonkarena acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT diersbrianw acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT hudsonmatthewe acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT wicklanddanielp comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT battugopal comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT hudsonkarena comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT diersbrianw comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy AT hudsonmatthewe comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

Ejemplares similares