Cargando…

Edge effects in calling variants from targeted amplicon sequencing

BACKGROUND: Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that co...

Descripción completa

Detalles Bibliográficos
Autores principales: Vijaya Satya, Ravi, DiCarlo, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302139/
https://www.ncbi.nlm.nih.gov/pubmed/25480444
http://dx.doi.org/10.1186/1471-2164-15-1073
_version_ 1782353745814749184
author Vijaya Satya, Ravi
DiCarlo, John
author_facet Vijaya Satya, Ravi
DiCarlo, John
author_sort Vijaya Satya, Ravi
collection PubMed
description BACKGROUND: Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls. RESULTS: We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced. CONCLUSIONS: Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4302139
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43021392015-01-23 Edge effects in calling variants from targeted amplicon sequencing Vijaya Satya, Ravi DiCarlo, John BMC Genomics Methodology Article BACKGROUND: Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls. RESULTS: We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced. CONCLUSIONS: Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-05 /pmc/articles/PMC4302139/ /pubmed/25480444 http://dx.doi.org/10.1186/1471-2164-15-1073 Text en © Vijaya Satya and DiCarlo; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Vijaya Satya, Ravi
DiCarlo, John
Edge effects in calling variants from targeted amplicon sequencing
title Edge effects in calling variants from targeted amplicon sequencing
title_full Edge effects in calling variants from targeted amplicon sequencing
title_fullStr Edge effects in calling variants from targeted amplicon sequencing
title_full_unstemmed Edge effects in calling variants from targeted amplicon sequencing
title_short Edge effects in calling variants from targeted amplicon sequencing
title_sort edge effects in calling variants from targeted amplicon sequencing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302139/
https://www.ncbi.nlm.nih.gov/pubmed/25480444
http://dx.doi.org/10.1186/1471-2164-15-1073
work_keys_str_mv AT vijayasatyaravi edgeeffectsincallingvariantsfromtargetedampliconsequencing
AT dicarlojohn edgeeffectsincallingvariantsfromtargetedampliconsequencing