Cargando…

Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data

Population genomics is a fast‐developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome‐wide information to identify the molecular variants underlying traits...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guirao‐Rico, Sara, González, Josefa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	RESOURCE ARTICLES
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8251607/ https://www.ncbi.nlm.nih.gov/pubmed/33534960 http://dx.doi.org/10.1111/1755-0998.13343

_version_	1783717124063100928
author	Guirao‐Rico, Sara González, Josefa
author_facet	Guirao‐Rico, Sara González, Josefa
author_sort	Guirao‐Rico, Sara
collection	PubMed
description	Population genomics is a fast‐developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome‐wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool‐seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE‐pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large‐scale population studies but also in cases where the Pool‐seq strategy is the only option, such as in metagenomic or polyploid studies.
format	Online Article Text
id	pubmed-8251607
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-82516072021-07-06 Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data Guirao‐Rico, Sara González, Josefa Mol Ecol Resour RESOURCE ARTICLES Population genomics is a fast‐developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome‐wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool‐seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE‐pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large‐scale population studies but also in cases where the Pool‐seq strategy is the only option, such as in metagenomic or polyploid studies. John Wiley and Sons Inc. 2021-03-05 2021-05 /pmc/articles/PMC8251607/ /pubmed/33534960 http://dx.doi.org/10.1111/1755-0998.13343 Text en © 2021 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	RESOURCE ARTICLES Guirao‐Rico, Sara González, Josefa Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title	Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title_full	Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title_fullStr	Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title_full_unstemmed	Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title_short	Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data
title_sort	benchmarking the performance of pool‐seq snp callers using simulated and real sequencing data
topic	RESOURCE ARTICLES
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8251607/ https://www.ncbi.nlm.nih.gov/pubmed/33534960 http://dx.doi.org/10.1111/1755-0998.13343
work_keys_str_mv	AT guiraoricosara benchmarkingtheperformanceofpoolseqsnpcallersusingsimulatedandrealsequencingdata AT gonzalezjosefa benchmarkingtheperformanceofpoolseqsnpcallersusingsimulatedandrealsequencingdata

Benchmarking the performance of Pool‐seq SNP callers using simulated and real sequencing data

Ejemplares similares