Cargando…

A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Wickland, Daniel P., Battu, Gopal, Hudson, Karen A., Diers, Brian W., Hudson, Matthew E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5745977/
https://www.ncbi.nlm.nih.gov/pubmed/29281959
http://dx.doi.org/10.1186/s12859-017-2000-6
_version_ 1783289016342282240
author Wickland, Daniel P.
Battu, Gopal
Hudson, Karen A.
Diers, Brian W.
Hudson, Matthew E.
author_facet Wickland, Daniel P.
Battu, Gopal
Hudson, Karen A.
Diers, Brian W.
Hudson, Matthew E.
author_sort Wickland, Daniel P.
collection PubMed
description BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. RESULTS: We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. CONCLUSIONS: We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.
format Online
Article
Text
id pubmed-5745977
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57459772018-01-03 A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy Wickland, Daniel P. Battu, Gopal Hudson, Karen A. Diers, Brian W. Hudson, Matthew E. BMC Bioinformatics Methodology Article BACKGROUND: Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools. RESULTS: We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed. CONCLUSIONS: We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data. BioMed Central 2017-12-28 /pmc/articles/PMC5745977/ /pubmed/29281959 http://dx.doi.org/10.1186/s12859-017-2000-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wickland, Daniel P.
Battu, Gopal
Hudson, Karen A.
Diers, Brian W.
Hudson, Matthew E.
A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_fullStr A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_full_unstemmed A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_short A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy
title_sort comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, gb-easy
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5745977/
https://www.ncbi.nlm.nih.gov/pubmed/29281959
http://dx.doi.org/10.1186/s12859-017-2000-6
work_keys_str_mv AT wicklanddanielp acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT battugopal acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT hudsonkarena acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT diersbrianw acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT hudsonmatthewe acomparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT wicklanddanielp comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT battugopal comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT hudsonkarena comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT diersbrianw comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy
AT hudsonmatthewe comparisonofgenotypingbysequencinganalysismethodsonlowcoveragecropdatasetsshowsadvantagesofanewworkflowgbeasy