Cargando…
High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing
BACKGROUND: In addition to heterogeneity and artificial selection, natural selection is one of the forces used to combat climate change and improve agrobiodiversity in evolutionary plant breeding. Accurate identification of the specific genomic effects of natural selection will likely accelerate tra...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8935755/ https://www.ncbi.nlm.nih.gov/pubmed/35313910 http://dx.doi.org/10.1186/s13007-022-00852-8 |
_version_ | 1784672095048826880 |
---|---|
author | Schneider, Michael Shrestha, Asis Ballvora, Agim Léon, Jens |
author_facet | Schneider, Michael Shrestha, Asis Ballvora, Agim Léon, Jens |
author_sort | Schneider, Michael |
collection | PubMed |
description | BACKGROUND: In addition to heterogeneity and artificial selection, natural selection is one of the forces used to combat climate change and improve agrobiodiversity in evolutionary plant breeding. Accurate identification of the specific genomic effects of natural selection will likely accelerate transfer between populations. Thus, insights into changes in allele frequency, adequate population size, gene flow and drift are essential. However, observing such effects often involves a trade-off between costs and resolution when a large sample of genotypes for many loci is analysed. Pool genotyping approaches achieve high resolution and precision in estimating allele frequency when sequence coverage is high. Nevertheless, high-coverage pool sequencing of large genomes is expensive. RESULTS: Three pool samples (n = 300, 300, 288) from a barley backcross population were generated to assess the population's allele frequency. The tested population (BC(2)F(21)) has undergone 18 generations of natural adaption to conventional farming practice. The accuracies of estimated pool-based allele frequencies and genome coverage yields were compared using three next-generation sequencing genotyping methods. To achieve accurate allele frequency estimates with low sequence coverage, we employed a haplotyping approach. Low coverage allele frequencies of closely located single polymorphisms were aggregated into a single haplotype allele frequency, yielding 2-to-271-times higher depth and increased precision. When we combined different haplotyping tactics, we found that gene and chip marker-based haplotype analyses performed equivalently or better compared with simple contig haplotype windows. Comparing multiple pool samples and referencing against an individual sequencing approach revealed that whole-genome pool re-sequencing (WGS) achieved the highest correlation with individual genotyping (≥ 0.97). In contrast, transcriptome-based genotyping (MACE) and genotyping by sequencing (GBS) pool replicates were significantly associated with higher error rates and lower correlations, but are still valuable to detect large allele frequency variations. CONCLUSIONS: The proposed strategy identified the allele frequency of populations with high accuracy at low cost. This is particularly relevant to evolutionary plant breeding of crops with very large genomes, such as barley. Whole-genome low coverage re-sequencing at 0.03 × coverage per genotype accurately estimated the allele frequency when a loci-based haplotyping approach was applied. The implementation of annotated haplotypes capitalises on the biological background and statistical robustness. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-022-00852-8. |
format | Online Article Text |
id | pubmed-8935755 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-89357552022-03-23 High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing Schneider, Michael Shrestha, Asis Ballvora, Agim Léon, Jens Plant Methods Research BACKGROUND: In addition to heterogeneity and artificial selection, natural selection is one of the forces used to combat climate change and improve agrobiodiversity in evolutionary plant breeding. Accurate identification of the specific genomic effects of natural selection will likely accelerate transfer between populations. Thus, insights into changes in allele frequency, adequate population size, gene flow and drift are essential. However, observing such effects often involves a trade-off between costs and resolution when a large sample of genotypes for many loci is analysed. Pool genotyping approaches achieve high resolution and precision in estimating allele frequency when sequence coverage is high. Nevertheless, high-coverage pool sequencing of large genomes is expensive. RESULTS: Three pool samples (n = 300, 300, 288) from a barley backcross population were generated to assess the population's allele frequency. The tested population (BC(2)F(21)) has undergone 18 generations of natural adaption to conventional farming practice. The accuracies of estimated pool-based allele frequencies and genome coverage yields were compared using three next-generation sequencing genotyping methods. To achieve accurate allele frequency estimates with low sequence coverage, we employed a haplotyping approach. Low coverage allele frequencies of closely located single polymorphisms were aggregated into a single haplotype allele frequency, yielding 2-to-271-times higher depth and increased precision. When we combined different haplotyping tactics, we found that gene and chip marker-based haplotype analyses performed equivalently or better compared with simple contig haplotype windows. Comparing multiple pool samples and referencing against an individual sequencing approach revealed that whole-genome pool re-sequencing (WGS) achieved the highest correlation with individual genotyping (≥ 0.97). In contrast, transcriptome-based genotyping (MACE) and genotyping by sequencing (GBS) pool replicates were significantly associated with higher error rates and lower correlations, but are still valuable to detect large allele frequency variations. CONCLUSIONS: The proposed strategy identified the allele frequency of populations with high accuracy at low cost. This is particularly relevant to evolutionary plant breeding of crops with very large genomes, such as barley. Whole-genome low coverage re-sequencing at 0.03 × coverage per genotype accurately estimated the allele frequency when a loci-based haplotyping approach was applied. The implementation of annotated haplotypes capitalises on the biological background and statistical robustness. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-022-00852-8. BioMed Central 2022-03-21 /pmc/articles/PMC8935755/ /pubmed/35313910 http://dx.doi.org/10.1186/s13007-022-00852-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Schneider, Michael Shrestha, Asis Ballvora, Agim Léon, Jens High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title | High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title_full | High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title_fullStr | High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title_full_unstemmed | High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title_short | High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
title_sort | high-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8935755/ https://www.ncbi.nlm.nih.gov/pubmed/35313910 http://dx.doi.org/10.1186/s13007-022-00852-8 |
work_keys_str_mv | AT schneidermichael highthroughputestimationofallelefrequenciesusingcombinedpooledpopulationsequencingandhaplotypebaseddataprocessing AT shresthaasis highthroughputestimationofallelefrequenciesusingcombinedpooledpopulationsequencingandhaplotypebaseddataprocessing AT ballvoraagim highthroughputestimationofallelefrequenciesusingcombinedpooledpopulationsequencingandhaplotypebaseddataprocessing AT leonjens highthroughputestimationofallelefrequenciesusingcombinedpooledpopulationsequencingandhaplotypebaseddataprocessing |