Cargando…

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hadigol, Mohammad, Khiabanian, Hossein
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5994075/ https://www.ncbi.nlm.nih.gov/pubmed/29884116 http://dx.doi.org/10.1186/s12859-018-2223-1

_version_	1783330351390654464
author	Hadigol, Mohammad Khiabanian, Hossein
author_facet	Hadigol, Mohammad Khiabanian, Hossein
author_sort	Hadigol, Mohammad
collection	PubMed
description	BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by their technical and theoretical limitations and sequencing errors cannot be completely eliminated. Comprehensive quantification of the background noise can highlight both the efficiency and the limitations of any HTS methodology, and help differentiate true mutations at low abundance from artifacts. RESULTS: We introduce MERIT (Mutation Error Rate Inference Toolkit), designed for in-depth quantification of erroneous substitutions and small insertions and deletions. MERIT incorporates an all-inclusive variant caller and considers genomic context, including the nucleotides immediately at 5 (′)and 3 (′), thereby establishing error rates for 96 possible substitutions as well as four single-base and 16 double-base indels. We applied MERIT to ultra-deep sequencing data (1,300,000 ×) obtained from the amplification of multiple clinically relevant loci, and showed a significant relationship between error rates and genomic contexts. In addition to observing significant difference between transversion and transition rates, we identified variations of more than 100-fold within each error type at high sequencing depths. For instance, T >G transversions in trinucleotide GTCs occurred 133.5 ± 65.9 more often than those in ATAs. Similarly, C >T transitions in GCGs were observed at 73.8 ± 10.5 higher rate than those in TCTs. We also devised an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of expected true mutations. Our analyses showed that increasing sequencing depth might improve sensitivity for detecting some mutations based on their genomic context. For example, T >G rate of error in GTCs did not change when sequenced beyond 10,000 ×; in contrast, T >G rate in TTAs consistently improved even at above 500,000 ×. CONCLUSIONS: Our results demonstrate significant variation in nucleotide misincorporation rates, and suggest that genomic context should be considered for comprehensive profiling of specimen-specific and sequencing artifacts in high-depth assays. This data provide strong evidence against assigning a single allele frequency threshold to call mutations, for it can result in substantial false positive as well as false negative variants, with important clinical consequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2223-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5994075
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-59940752018-06-21 MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications Hadigol, Mohammad Khiabanian, Hossein BMC Bioinformatics Methodology Article BACKGROUND: Rapid progress in high-throughput sequencing (HTS) and the development of novel library preparation methods have improved the sensitivity of detecting mutations in heterogeneous samples, specifically in high-depth (> 500×) clinical applications. However, HTS methods are bounded by their technical and theoretical limitations and sequencing errors cannot be completely eliminated. Comprehensive quantification of the background noise can highlight both the efficiency and the limitations of any HTS methodology, and help differentiate true mutations at low abundance from artifacts. RESULTS: We introduce MERIT (Mutation Error Rate Inference Toolkit), designed for in-depth quantification of erroneous substitutions and small insertions and deletions. MERIT incorporates an all-inclusive variant caller and considers genomic context, including the nucleotides immediately at 5 (′)and 3 (′), thereby establishing error rates for 96 possible substitutions as well as four single-base and 16 double-base indels. We applied MERIT to ultra-deep sequencing data (1,300,000 ×) obtained from the amplification of multiple clinically relevant loci, and showed a significant relationship between error rates and genomic contexts. In addition to observing significant difference between transversion and transition rates, we identified variations of more than 100-fold within each error type at high sequencing depths. For instance, T >G transversions in trinucleotide GTCs occurred 133.5 ± 65.9 more often than those in ATAs. Similarly, C >T transitions in GCGs were observed at 73.8 ± 10.5 higher rate than those in TCTs. We also devised an in silico approach to determine the optimal sequencing depth, where errors occur at rates similar to those of expected true mutations. Our analyses showed that increasing sequencing depth might improve sensitivity for detecting some mutations based on their genomic context. For example, T >G rate of error in GTCs did not change when sequenced beyond 10,000 ×; in contrast, T >G rate in TTAs consistently improved even at above 500,000 ×. CONCLUSIONS: Our results demonstrate significant variation in nucleotide misincorporation rates, and suggest that genomic context should be considered for comprehensive profiling of specimen-specific and sequencing artifacts in high-depth assays. This data provide strong evidence against assigning a single allele frequency threshold to call mutations, for it can result in substantial false positive as well as false negative variants, with important clinical consequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2223-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-08 /pmc/articles/PMC5994075/ /pubmed/29884116 http://dx.doi.org/10.1186/s12859-018-2223-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Hadigol, Mohammad Khiabanian, Hossein MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title	MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_full	MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_fullStr	MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_full_unstemmed	MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_short	MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications
title_sort	merit reveals the impact of genomic context on sequencing error rate in ultra-deep applications
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5994075/ https://www.ncbi.nlm.nih.gov/pubmed/29884116 http://dx.doi.org/10.1186/s12859-018-2223-1
work_keys_str_mv	AT hadigolmohammad meritrevealstheimpactofgenomiccontextonsequencingerrorrateinultradeepapplications AT khiabanianhossein meritrevealstheimpactofgenomiccontextonsequencingerrorrateinultradeepapplications

MERIT reveals the impact of genomic context on sequencing error rate in ultra-deep applications

Ejemplares similares