Cargando…

Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing

BACKGROUND: Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and m...

Descripción completa

Detalles Bibliográficos
Autores principales: Ros-Freixedes, Roger, Battagin, Mara, Johnsson, Martin, Gorjanc, Gregor, Mileham, Alan J., Rounsley, Steve D., Hickey, John M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293637/
https://www.ncbi.nlm.nih.gov/pubmed/30545283
http://dx.doi.org/10.1186/s12711-018-0436-4
_version_ 1783380578786082816
author Ros-Freixedes, Roger
Battagin, Mara
Johnsson, Martin
Gorjanc, Gregor
Mileham, Alan J.
Rounsley, Steve D.
Hickey, John M.
author_facet Ros-Freixedes, Roger
Battagin, Mara
Johnsson, Martin
Gorjanc, Gregor
Mileham, Alan J.
Rounsley, Steve D.
Hickey, John M.
author_sort Ros-Freixedes, Roger
collection PubMed
description BACKGROUND: Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. RESULTS: We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. CONCLUSIONS: We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.
format Online
Article
Text
id pubmed-6293637
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62936372018-12-18 Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing Ros-Freixedes, Roger Battagin, Mara Johnsson, Martin Gorjanc, Gregor Mileham, Alan J. Rounsley, Steve D. Hickey, John M. Genet Sel Evol Research Article BACKGROUND: Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. RESULTS: We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. CONCLUSIONS: We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data. BioMed Central 2018-12-13 /pmc/articles/PMC6293637/ /pubmed/30545283 http://dx.doi.org/10.1186/s12711-018-0436-4 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ros-Freixedes, Roger
Battagin, Mara
Johnsson, Martin
Gorjanc, Gregor
Mileham, Alan J.
Rounsley, Steve D.
Hickey, John M.
Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title_full Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title_fullStr Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title_full_unstemmed Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title_short Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
title_sort impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293637/
https://www.ncbi.nlm.nih.gov/pubmed/30545283
http://dx.doi.org/10.1186/s12711-018-0436-4
work_keys_str_mv AT rosfreixedesroger impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT battaginmara impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT johnssonmartin impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT gorjancgregor impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT milehamalanj impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT rounsleysteved impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing
AT hickeyjohnm impactofindexhoppingandbiastowardsthereferencealleleonaccuracyofgenotypecallsfromlowcoveragesequencing