Cargando…
Separation and assembly of deep sequencing data into discrete sub-population genomes
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737798/ https://www.ncbi.nlm.nih.gov/pubmed/28977510 http://dx.doi.org/10.1093/nar/gkx755 |
_version_ | 1783287578960592896 |
---|---|
author | Karagiannis, Konstantinos Simonyan, Vahan Chumakov, Konstantin Mazumder, Raja |
author_facet | Karagiannis, Konstantinos Simonyan, Vahan Chumakov, Konstantin Mazumder, Raja |
author_sort | Karagiannis, Konstantinos |
collection | PubMed |
description | Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process. |
format | Online Article Text |
id | pubmed-5737798 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57377982018-01-04 Separation and assembly of deep sequencing data into discrete sub-population genomes Karagiannis, Konstantinos Simonyan, Vahan Chumakov, Konstantin Mazumder, Raja Nucleic Acids Res Computational Biology Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process. Oxford University Press 2017-11-02 2017-08-28 /pmc/articles/PMC5737798/ /pubmed/28977510 http://dx.doi.org/10.1093/nar/gkx755 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Karagiannis, Konstantinos Simonyan, Vahan Chumakov, Konstantin Mazumder, Raja Separation and assembly of deep sequencing data into discrete sub-population genomes |
title | Separation and assembly of deep sequencing data into discrete sub-population genomes |
title_full | Separation and assembly of deep sequencing data into discrete sub-population genomes |
title_fullStr | Separation and assembly of deep sequencing data into discrete sub-population genomes |
title_full_unstemmed | Separation and assembly of deep sequencing data into discrete sub-population genomes |
title_short | Separation and assembly of deep sequencing data into discrete sub-population genomes |
title_sort | separation and assembly of deep sequencing data into discrete sub-population genomes |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737798/ https://www.ncbi.nlm.nih.gov/pubmed/28977510 http://dx.doi.org/10.1093/nar/gkx755 |
work_keys_str_mv | AT karagianniskonstantinos separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes AT simonyanvahan separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes AT chumakovkonstantin separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes AT mazumderraja separationandassemblyofdeepsequencingdataintodiscretesubpopulationgenomes |