Cargando…

Improved detection of artifactual viral minority variants in high-throughput sequencing data

High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina Hi...

Descripción completa

Detalles Bibliográficos
Autores principales: Welkers, Matthijs R. A., Jonges, Marcel, Jeeninga, Rienk E., Koopmans, Marion P. G., de Jong, Menno D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302989/
https://www.ncbi.nlm.nih.gov/pubmed/25657642
http://dx.doi.org/10.3389/fmicb.2014.00804
_version_ 1782353879985291264
author Welkers, Matthijs R. A.
Jonges, Marcel
Jeeninga, Rienk E.
Koopmans, Marion P. G.
de Jong, Menno D.
author_facet Welkers, Matthijs R. A.
Jonges, Marcel
Jeeninga, Rienk E.
Koopmans, Marion P. G.
de Jong, Menno D.
author_sort Welkers, Matthijs R. A.
collection PubMed
description High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after “best practice” quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).
format Online
Article
Text
id pubmed-4302989
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-43029892015-02-05 Improved detection of artifactual viral minority variants in high-throughput sequencing data Welkers, Matthijs R. A. Jonges, Marcel Jeeninga, Rienk E. Koopmans, Marion P. G. de Jong, Menno D. Front Microbiol Microbiology High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after “best practice” quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). Frontiers Media S.A. 2015-01-22 /pmc/articles/PMC4302989/ /pubmed/25657642 http://dx.doi.org/10.3389/fmicb.2014.00804 Text en Copyright © 2015 Welkers, Jonges, Jeeninga, Koopmans and de Jong. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Welkers, Matthijs R. A.
Jonges, Marcel
Jeeninga, Rienk E.
Koopmans, Marion P. G.
de Jong, Menno D.
Improved detection of artifactual viral minority variants in high-throughput sequencing data
title Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_full Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_fullStr Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_full_unstemmed Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_short Improved detection of artifactual viral minority variants in high-throughput sequencing data
title_sort improved detection of artifactual viral minority variants in high-throughput sequencing data
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302989/
https://www.ncbi.nlm.nih.gov/pubmed/25657642
http://dx.doi.org/10.3389/fmicb.2014.00804
work_keys_str_mv AT welkersmatthijsra improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT jongesmarcel improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT jeeningarienke improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT koopmansmarionpg improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata
AT dejongmennod improveddetectionofartifactualviralminorityvariantsinhighthroughputsequencingdata