Cargando…

Inferring Short-Range Linkage Information from Sequencing Chromatograms

Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the var...

Descripción completa

Detalles Bibliográficos
Autores principales: Beggel, Bastian, Neumann-Fraune, Maria, Kaiser, Rolf, Verheyen, Jens, Lengauer, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3869653/
https://www.ncbi.nlm.nih.gov/pubmed/24376502
http://dx.doi.org/10.1371/journal.pone.0081687
_version_ 1782296585993977856
author Beggel, Bastian
Neumann-Fraune, Maria
Kaiser, Rolf
Verheyen, Jens
Lengauer, Thomas
author_facet Beggel, Bastian
Neumann-Fraune, Maria
Kaiser, Rolf
Verheyen, Jens
Lengauer, Thomas
author_sort Beggel, Bastian
collection PubMed
description Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip.
format Online
Article
Text
id pubmed-3869653
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38696532013-12-27 Inferring Short-Range Linkage Information from Sequencing Chromatograms Beggel, Bastian Neumann-Fraune, Maria Kaiser, Rolf Verheyen, Jens Lengauer, Thomas PLoS One Research Article Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. Public Library of Science 2013-12-20 /pmc/articles/PMC3869653/ /pubmed/24376502 http://dx.doi.org/10.1371/journal.pone.0081687 Text en © 2013 Beggel et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Beggel, Bastian
Neumann-Fraune, Maria
Kaiser, Rolf
Verheyen, Jens
Lengauer, Thomas
Inferring Short-Range Linkage Information from Sequencing Chromatograms
title Inferring Short-Range Linkage Information from Sequencing Chromatograms
title_full Inferring Short-Range Linkage Information from Sequencing Chromatograms
title_fullStr Inferring Short-Range Linkage Information from Sequencing Chromatograms
title_full_unstemmed Inferring Short-Range Linkage Information from Sequencing Chromatograms
title_short Inferring Short-Range Linkage Information from Sequencing Chromatograms
title_sort inferring short-range linkage information from sequencing chromatograms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3869653/
https://www.ncbi.nlm.nih.gov/pubmed/24376502
http://dx.doi.org/10.1371/journal.pone.0081687
work_keys_str_mv AT beggelbastian inferringshortrangelinkageinformationfromsequencingchromatograms
AT neumannfraunemaria inferringshortrangelinkageinformationfromsequencingchromatograms
AT kaiserrolf inferringshortrangelinkageinformationfromsequencingchromatograms
AT verheyenjens inferringshortrangelinkageinformationfromsequencingchromatograms
AT lengauerthomas inferringshortrangelinkageinformationfromsequencingchromatograms