Cargando…

Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP

BACKGROUND: Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Perea, Claudia, De La Hoz, Juan Fernando, Cruz, Daniel Felipe, Lobaton, Juan David, Izquierdo, Paulo, Quintero, Juan Camilo, Raatz, Bodo, Duitama, Jorge
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009557/ https://www.ncbi.nlm.nih.gov/pubmed/27585926 http://dx.doi.org/10.1186/s12864-016-2827-7

_version_	1782451535501852672
author	Perea, Claudia De La Hoz, Juan Fernando Cruz, Daniel Felipe Lobaton, Juan David Izquierdo, Paulo Quintero, Juan Camilo Raatz, Bodo Duitama, Jorge
author_facet	Perea, Claudia De La Hoz, Juan Fernando Cruz, Daniel Felipe Lobaton, Juan David Izquierdo, Paulo Quintero, Juan Camilo Raatz, Bodo Duitama, Jorge
author_sort	Perea, Claudia
collection	PubMed
description	BACKGROUND: Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. RESULTS: Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. CONCLUSIONS: NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2827-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5009557
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50095572016-09-08 Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP Perea, Claudia De La Hoz, Juan Fernando Cruz, Daniel Felipe Lobaton, Juan David Izquierdo, Paulo Quintero, Juan Camilo Raatz, Bodo Duitama, Jorge BMC Genomics Research BACKGROUND: Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. RESULTS: Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. CONCLUSIONS: NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2827-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-31 /pmc/articles/PMC5009557/ /pubmed/27585926 http://dx.doi.org/10.1186/s12864-016-2827-7 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Perea, Claudia De La Hoz, Juan Fernando Cruz, Daniel Felipe Lobaton, Juan David Izquierdo, Paulo Quintero, Juan Camilo Raatz, Bodo Duitama, Jorge Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title	Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_full	Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_fullStr	Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_full_unstemmed	Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_short	Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP
title_sort	bioinformatic analysis of genotype by sequencing (gbs) data with ngsep
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009557/ https://www.ncbi.nlm.nih.gov/pubmed/27585926 http://dx.doi.org/10.1186/s12864-016-2827-7
work_keys_str_mv	AT pereaclaudia bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT delahozjuanfernando bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT cruzdanielfelipe bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT lobatonjuandavid bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT izquierdopaulo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT quinterojuancamilo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT raatzbodo bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep AT duitamajorge bioinformaticanalysisofgenotypebysequencinggbsdatawithngsep

Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP

Ejemplares similares