Cargando…

Improving Imputation Quality in BEAGLE for Crop and Livestock Data

Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pook, Torsten, Mayer, Manfred, Geibel, Johannes, Weigend, Steffen, Cavero, David, Schoen, Chris C., Simianer, Henner
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Genetics Society of America 2019
Materias:	Investigations
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6945036/ https://www.ncbi.nlm.nih.gov/pubmed/31676508 http://dx.doi.org/10.1534/g3.119.400798

_version_	1783485114346373120
author	Pook, Torsten Mayer, Manfred Geibel, Johannes Weigend, Steffen Cavero, David Schoen, Chris C. Simianer, Henner
author_facet	Pook, Torsten Mayer, Manfred Geibel, Johannes Weigend, Steffen Cavero, David Schoen, Chris C. Simianer, Henner
author_sort	Pook, Torsten
collection	PubMed
description	Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals.
format	Online Article Text
id	pubmed-6945036
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Genetics Society of America
record_format	MEDLINE/PubMed
spelling	pubmed-69450362020-01-09 Improving Imputation Quality in BEAGLE for Crop and Livestock Data Pook, Torsten Mayer, Manfred Geibel, Johannes Weigend, Steffen Cavero, David Schoen, Chris C. Simianer, Henner G3 (Bethesda) Investigations Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals. Genetics Society of America 2019-11-01 /pmc/articles/PMC6945036/ /pubmed/31676508 http://dx.doi.org/10.1534/g3.119.400798 Text en Copyright © 2020 Pook et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Investigations Pook, Torsten Mayer, Manfred Geibel, Johannes Weigend, Steffen Cavero, David Schoen, Chris C. Simianer, Henner Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title	Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title_full	Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title_fullStr	Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title_full_unstemmed	Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title_short	Improving Imputation Quality in BEAGLE for Crop and Livestock Data
title_sort	improving imputation quality in beagle for crop and livestock data
topic	Investigations
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6945036/ https://www.ncbi.nlm.nih.gov/pubmed/31676508 http://dx.doi.org/10.1534/g3.119.400798
work_keys_str_mv	AT pooktorsten improvingimputationqualityinbeagleforcropandlivestockdata AT mayermanfred improvingimputationqualityinbeagleforcropandlivestockdata AT geibeljohannes improvingimputationqualityinbeagleforcropandlivestockdata AT weigendsteffen improvingimputationqualityinbeagleforcropandlivestockdata AT caverodavid improvingimputationqualityinbeagleforcropandlivestockdata AT schoenchrisc improvingimputationqualityinbeagleforcropandlivestockdata AT simianerhenner improvingimputationqualityinbeagleforcropandlivestockdata

Improving Imputation Quality in BEAGLE for Crop and Livestock Data

Ejemplares similares