Cargando…
Benchmarking phasing software with a whole-genome sequenced cattle pedigree
BACKGROUND: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took adva...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845340/ https://www.ncbi.nlm.nih.gov/pubmed/35164677 http://dx.doi.org/10.1186/s12864-022-08354-6 |
_version_ | 1784651654847528960 |
---|---|
author | Oget-Ebrad, Claire Kadri, Naveen Kumar Moreira, Gabriel Costa Monteiro Karim, Latifa Coppieters, Wouter Georges, Michel Druet, Tom |
author_facet | Oget-Ebrad, Claire Kadri, Naveen Kumar Moreira, Gabriel Costa Monteiro Karim, Latifa Coppieters, Wouter Georges, Michel Druet, Tom |
author_sort | Oget-Ebrad, Claire |
collection | PubMed |
description | BACKGROUND: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. RESULTS: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. CONCLUSIONS: We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08354-6. |
format | Online Article Text |
id | pubmed-8845340 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-88453402022-02-16 Benchmarking phasing software with a whole-genome sequenced cattle pedigree Oget-Ebrad, Claire Kadri, Naveen Kumar Moreira, Gabriel Costa Monteiro Karim, Latifa Coppieters, Wouter Georges, Michel Druet, Tom BMC Genomics Research BACKGROUND: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. RESULTS: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. CONCLUSIONS: We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08354-6. BioMed Central 2022-02-15 /pmc/articles/PMC8845340/ /pubmed/35164677 http://dx.doi.org/10.1186/s12864-022-08354-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Oget-Ebrad, Claire Kadri, Naveen Kumar Moreira, Gabriel Costa Monteiro Karim, Latifa Coppieters, Wouter Georges, Michel Druet, Tom Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title | Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title_full | Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title_fullStr | Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title_full_unstemmed | Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title_short | Benchmarking phasing software with a whole-genome sequenced cattle pedigree |
title_sort | benchmarking phasing software with a whole-genome sequenced cattle pedigree |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8845340/ https://www.ncbi.nlm.nih.gov/pubmed/35164677 http://dx.doi.org/10.1186/s12864-022-08354-6 |
work_keys_str_mv | AT ogetebradclaire benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT kadrinaveenkumar benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT moreiragabrielcostamonteiro benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT karimlatifa benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT coppieterswouter benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT georgesmichel benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree AT druettom benchmarkingphasingsoftwarewithawholegenomesequencedcattlepedigree |