Cargando…

Separation and assembly of deep sequencing data into discrete sub-population genomes

Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce...

Descripción completa

Detalles Bibliográficos
Autores principales: Karagiannis, Konstantinos, Simonyan, Vahan, Chumakov, Konstantin, Mazumder, Raja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737798/
https://www.ncbi.nlm.nih.gov/pubmed/28977510
http://dx.doi.org/10.1093/nar/gkx755
_version_ 1783287578960592896
author Karagiannis, Konstantinos
Simonyan, Vahan
Chumakov, Konstantin
Mazumder, Raja
author_facet Karagiannis, Konstantinos
Simonyan, Vahan
Chumakov, Konstantin
Mazumder, Raja
author_sort Karagiannis, Konstantinos
collection PubMed
description Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.
format Online
Article
Text
id pubmed-5737798
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57377982018-01-04 Separation and assembly of deep sequencing data into discrete sub-population genomes Karagiannis, Konstantinos Simonyan, Vahan Chumakov, Konstantin Mazumder, Raja Nucleic Acids Res Computational Biology Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process. Oxford University Press 2017-11-02 2017-08-28 /pmc/articles/PMC5737798/ /pubmed/28977510 http://dx.doi.org/10.1093/nar/gkx755 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Karagiannis, Konstantinos
Simonyan, Vahan
Chumakov, Konstantin
Mazumder, Raja
Separation and assembly of deep sequencing data into discrete sub-population genomes
title Separation and assembly of deep sequencing data into discrete sub-population genomes
title_full Separation and assembly of deep sequencing data into discrete sub-population genomes
title_fullStr Separation and assembly of deep sequencing data into discrete sub-population genomes
title_full_unstemmed Separation and assembly of deep sequencing data into discrete sub-population genomes
title_short Separation and assembly of deep sequencing data into discrete sub-population genomes
title_sort separation and assembly of deep sequencing data into discrete sub-population genomes
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737798/
https://www.ncbi.nlm.nih.gov/pubmed/28977510
http://dx.doi.org/10.1093/nar/gkx755
work_keys_str_mv AT karagianniskonstantinos separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes
AT simonyanvahan separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes
AT chumakovkonstantin separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes
AT mazumderraja separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes