Cargando…
The presence and impact of reference bias on population genomic studies of prehistoric human populations
Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map successfully, or receive higher quality scores. This reference bias can have effects on downstream population ge...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6685638/ https://www.ncbi.nlm.nih.gov/pubmed/31348818 http://dx.doi.org/10.1371/journal.pgen.1008302 |
_version_ | 1783442432990380032 |
---|---|
author | Günther, Torsten Nettelblad, Carl |
author_facet | Günther, Torsten Nettelblad, Carl |
author_sort | Günther, Torsten |
collection | PubMed |
description | Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map successfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele. In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp—reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudo-haploid data, i.e. they randomly sample only one sequencing read per site. We show that reference bias is pervasive in published ancient DNA sequence data of prehistoric humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Most genomic regions we investigated show little to no mapping bias but even a small proportion of sites with bias can impact analyses of those particular loci or slightly skew genome-wide estimates. Therefore, reference bias has the potential to cause minor but significant differences in the results of downstream analyses such as population allele sharing, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially. |
format | Online Article Text |
id | pubmed-6685638 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66856382019-08-15 The presence and impact of reference bias on population genomic studies of prehistoric human populations Günther, Torsten Nettelblad, Carl PLoS Genet Research Article Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map successfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele. In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp—reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudo-haploid data, i.e. they randomly sample only one sequencing read per site. We show that reference bias is pervasive in published ancient DNA sequence data of prehistoric humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Most genomic regions we investigated show little to no mapping bias but even a small proportion of sites with bias can impact analyses of those particular loci or slightly skew genome-wide estimates. Therefore, reference bias has the potential to cause minor but significant differences in the results of downstream analyses such as population allele sharing, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially. Public Library of Science 2019-07-26 /pmc/articles/PMC6685638/ /pubmed/31348818 http://dx.doi.org/10.1371/journal.pgen.1008302 Text en © 2019 Günther, Nettelblad http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Günther, Torsten Nettelblad, Carl The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title | The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title_full | The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title_fullStr | The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title_full_unstemmed | The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title_short | The presence and impact of reference bias on population genomic studies of prehistoric human populations |
title_sort | presence and impact of reference bias on population genomic studies of prehistoric human populations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6685638/ https://www.ncbi.nlm.nih.gov/pubmed/31348818 http://dx.doi.org/10.1371/journal.pgen.1008302 |
work_keys_str_mv | AT gunthertorsten thepresenceandimpactofreferencebiasonpopulationgenomicstudiesofprehistorichumanpopulations AT nettelbladcarl thepresenceandimpactofreferencebiasonpopulationgenomicstudiesofprehistorichumanpopulations AT gunthertorsten presenceandimpactofreferencebiasonpopulationgenomicstudiesofprehistorichumanpopulations AT nettelbladcarl presenceandimpactofreferencebiasonpopulationgenomicstudiesofprehistorichumanpopulations |