Cargando…

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by the...

Descripción completa

Detalles Bibliográficos
Autores principales: Hadigol, Mohammad, Khiabanian, Hossein
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5994075/
https://www.ncbi.nlm.nih.gov/pubmed/29884116
http://dx.doi.org/10.1186/s12859-018-2223-1
_version_ 1783330351390654464
author Hadigol, Mohammad
Khiabanian, Hossein
author_facet Hadigol, Mohammad
Khiabanian, Hossein
author_sort Hadigol, Mohammad
collection PubMed
description BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by their technical and theoretical limitations and sequencing errors cannot be completely eliminated. Comprehensive quantification of the background noise can highlight both the efficiency and the limitations of any HTS methodology, and help differentiate true mutations at low abundance from artifacts. RESULTS: We introduce MERIT (Mutation Error Rate Inference Toolkit), designed for in-depth quantification of erroneous substitutions and small insertions and deletions. MERIT incorporates an all-inclusive variant caller and considers genomic context, including the nucleotides immediately at 5 (′)and 3 (′), thereby establishing error rates for 96 possible substitutions as well as four single-base and 16 double-base indels. We applied MERIT to ultra-deep sequencing data (1,300,000 ×) obtained from the amplification of multiple clinically relevant loci, and showed a significant relationship between error rates and genomic contexts. In addition to observing significant difference between transversion and transition rates, we identified variations of more than 100-fold within each error type at high sequencing depths. For instance, T >G transversions in trinucleotide GTCs occurred 133.5 ± 65.9 more often than those in ATAs. Similarly, C >T transitions in GCGs were observed at 73.8 ± 10.5 higher rate than those in TCTs. We also devised an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of expected true mutations. Our analyses showed that increasing sequencing depth might improve sensitivity for detecting some mutations based on their genomic context. For example, T >G rate of error in GTCs did not change when sequenced beyond 10,000 ×; in contrast, T >G rate in TTAs consistently improved even at above 500,000 ×. CONCLUSIONS: Our results demonstrate significant variation in nucleotide misincorporation rates, and suggest that genomic context should be considered for comprehensive profiling of specimen-specific and sequencing artifacts in high-depth assays. This data provide strong evidence against assigning a single allele frequency threshold to call mutations, for it can result in substantial false positive as well as false negative variants, with important clinical consequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2223-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5994075
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59940752018-06-21 MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications Hadigol, Mohammad Khiabanian, Hossein BMC Bioinformatics Methodology Article BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by their technical and theoretical limitations and sequencing errors cannot be completely eliminated. Comprehensive quantification of the background noise can highlight both the efficiency and the limitations of any HTS methodology, and help differentiate true mutations at low abundance from artifacts. RESULTS: We introduce MERIT (Mutation Error Rate Inference Toolkit), designed for in-depth quantification of erroneous substitutions and small insertions and deletions. MERIT incorporates an all-inclusive variant caller and considers genomic context, including the nucleotides immediately at 5 (′)and 3 (′), thereby establishing error rates for 96 possible substitutions as well as four single-base and 16 double-base indels. We applied MERIT to ultra-deep sequencing data (1,300,000 ×) obtained from the amplification of multiple clinically relevant loci, and showed a significant relationship between error rates and genomic contexts. In addition to observing significant difference between transversion and transition rates, we identified variations of more than 100-fold within each error type at high sequencing depths. For instance, T >G transversions in trinucleotide GTCs occurred 133.5 ± 65.9 more often than those in ATAs. Similarly, C >T transitions in GCGs were observed at 73.8 ± 10.5 higher rate than those in TCTs. We also devised an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of expected true mutations. Our analyses showed that increasing sequencing depth might improve sensitivity for detecting some mutations based on their genomic context. For example, T >G rate of error in GTCs did not change when sequenced beyond 10,000 ×; in contrast, T >G rate in TTAs consistently improved even at above 500,000 ×. CONCLUSIONS: Our results demonstrate significant variation in nucleotide misincorporation rates, and suggest that genomic context should be considered for comprehensive profiling of specimen-specific and sequencing artifacts in high-depth assays. This data provide strong evidence against assigning a single allele frequency threshold to call mutations, for it can result in substantial false positive as well as false negative variants, with important clinical consequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2223-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-08 /pmc/articles/PMC5994075/ /pubmed/29884116 http://dx.doi.org/10.1186/s12859-018-2223-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Hadigol, Mohammad
Khiabanian, Hossein
MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_full MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_fullStr MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_full_unstemmed MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_short MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_sort merit reveals the impact of genomic context on sequencing error rate in ultra-deep applications
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5994075/
https://www.ncbi.nlm.nih.gov/pubmed/29884116
http://dx.doi.org/10.1186/s12859-018-2223-1
work_keys_str_mv AT hadigolmohammad meritrevealstheimpactofgenomiccontextonsequencingerrorrateinultradeepapplications
AT khiabanianhossein meritrevealstheimpactofgenomiccontextonsequencingerrorrateinultradeepapplications