Cargando…

Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data

The cost-effectiveness of sequencing pools of individuals (Pool-Seq) provides the basis for the popularity and widespread use of this method for many research questions, ranging from unraveling the genetic basis of complex traits, to the clonal evolution of cancer cells. Because the accuracy of Pool...

Descripción completa

Detalles Bibliográficos
Autores principales: Kofler, Robert, Langmüller, Anna Maria, Nouhaud, Pierre, Otte, Kathrin Anna, Schlötterer, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100849/
https://www.ncbi.nlm.nih.gov/pubmed/27613752
http://dx.doi.org/10.1534/g3.116.034488
_version_ 1782466201509691392
author Kofler, Robert
Langmüller, Anna Maria
Nouhaud, Pierre
Otte, Kathrin Anna
Schlötterer, Christian
author_facet Kofler, Robert
Langmüller, Anna Maria
Nouhaud, Pierre
Otte, Kathrin Anna
Schlötterer, Christian
author_sort Kofler, Robert
collection PubMed
description The cost-effectiveness of sequencing pools of individuals (Pool-Seq) provides the basis for the popularity and widespread use of this method for many research questions, ranging from unraveling the genetic basis of complex traits, to the clonal evolution of cancer cells. Because the accuracy of Pool-Seq could be affected by many potential sources of error, several studies have determined, for example, the influence of sequencing technology, the library preparation protocol, and mapping parameters. Nevertheless, the impact of the mapping tools has not yet been evaluated. Using simulated and real Pool-Seq data, we demonstrate a substantial impact of the mapping tools, leading to characteristic false positives in genome-wide scans. The problem of false positives was particularly pronounced when data with different read lengths and insert sizes were compared. Out of 14 evaluated algorithms novoalign, bwa mem and clc4 are most suitable for mapping Pool-Seq data. Nevertheless, no single algorithm is sufficient for avoiding all false positives. We show that the intersection of the results of two mapping algorithms provides a simple, yet effective, strategy to eliminate false positives. We propose that the implementation of a consistent Pool-Seq bioinformatics pipeline, building on the recommendations of this study, can substantially increase the reliability of Pool-Seq results, in particular when libraries generated with different protocols are being compared.
format Online
Article
Text
id pubmed-5100849
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-51008492016-11-09 Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data Kofler, Robert Langmüller, Anna Maria Nouhaud, Pierre Otte, Kathrin Anna Schlötterer, Christian G3 (Bethesda) Investigations The cost-effectiveness of sequencing pools of individuals (Pool-Seq) provides the basis for the popularity and widespread use of this method for many research questions, ranging from unraveling the genetic basis of complex traits, to the clonal evolution of cancer cells. Because the accuracy of Pool-Seq could be affected by many potential sources of error, several studies have determined, for example, the influence of sequencing technology, the library preparation protocol, and mapping parameters. Nevertheless, the impact of the mapping tools has not yet been evaluated. Using simulated and real Pool-Seq data, we demonstrate a substantial impact of the mapping tools, leading to characteristic false positives in genome-wide scans. The problem of false positives was particularly pronounced when data with different read lengths and insert sizes were compared. Out of 14 evaluated algorithms novoalign, bwa mem and clc4 are most suitable for mapping Pool-Seq data. Nevertheless, no single algorithm is sufficient for avoiding all false positives. We show that the intersection of the results of two mapping algorithms provides a simple, yet effective, strategy to eliminate false positives. We propose that the implementation of a consistent Pool-Seq bioinformatics pipeline, building on the recommendations of this study, can substantially increase the reliability of Pool-Seq results, in particular when libraries generated with different protocols are being compared. Genetics Society of America 2016-09-09 /pmc/articles/PMC5100849/ /pubmed/27613752 http://dx.doi.org/10.1534/g3.116.034488 Text en Copyright © 2016 Kofler et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Kofler, Robert
Langmüller, Anna Maria
Nouhaud, Pierre
Otte, Kathrin Anna
Schlötterer, Christian
Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title_full Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title_fullStr Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title_full_unstemmed Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title_short Suitability of Different Mapping Algorithms for Genome-Wide Polymorphism Scans with Pool-Seq Data
title_sort suitability of different mapping algorithms for genome-wide polymorphism scans with pool-seq data
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100849/
https://www.ncbi.nlm.nih.gov/pubmed/27613752
http://dx.doi.org/10.1534/g3.116.034488
work_keys_str_mv AT koflerrobert suitabilityofdifferentmappingalgorithmsforgenomewidepolymorphismscanswithpoolseqdata
AT langmullerannamaria suitabilityofdifferentmappingalgorithmsforgenomewidepolymorphismscanswithpoolseqdata
AT nouhaudpierre suitabilityofdifferentmappingalgorithmsforgenomewidepolymorphismscanswithpoolseqdata
AT ottekathrinanna suitabilityofdifferentmappingalgorithmsforgenomewidepolymorphismscanswithpoolseqdata
AT schlottererchristian suitabilityofdifferentmappingalgorithmsforgenomewidepolymorphismscanswithpoolseqdata