Cargando…

Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants

BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsyvina, Viachaslau, Campo, David S., Sims, Seth, Zelikovsky, Alex, Khudyakov, Yury, Skums, Pavel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196405/
https://www.ncbi.nlm.nih.gov/pubmed/30343669
http://dx.doi.org/10.1186/s12859-018-2333-9
_version_ 1783364550660194304
author Tsyvina, Viachaslau
Campo, David S.
Sims, Seth
Zelikovsky, Alex
Khudyakov, Yury
Skums, Pavel
author_facet Tsyvina, Viachaslau
Campo, David S.
Sims, Seth
Zelikovsky, Alex
Khudyakov, Yury
Skums, Pavel
author_sort Tsyvina, Viachaslau
collection PubMed
description BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.
format Online
Article
Text
id pubmed-6196405
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61964052018-10-30 Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants Tsyvina, Viachaslau Campo, David S. Sims, Seth Zelikovsky, Alex Khudyakov, Yury Skums, Pavel BMC Bioinformatics Methodology BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data. BioMed Central 2018-10-22 /pmc/articles/PMC6196405/ /pubmed/30343669 http://dx.doi.org/10.1186/s12859-018-2333-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Tsyvina, Viachaslau
Campo, David S.
Sims, Seth
Zelikovsky, Alex
Khudyakov, Yury
Skums, Pavel
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title_full Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title_fullStr Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title_full_unstemmed Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title_short Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
title_sort fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6196405/
https://www.ncbi.nlm.nih.gov/pubmed/30343669
http://dx.doi.org/10.1186/s12859-018-2333-9
work_keys_str_mv AT tsyvinaviachaslau fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants
AT campodavids fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants
AT simsseth fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants
AT zelikovskyalex fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants
AT khudyakovyury fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants
AT skumspavel fastestimationofgeneticrelatednessbetweenmembersofheterogeneouspopulationsofcloselyrelatedgenomicvariants