Cargando…

High-resolution strain-level microbiome composition analysis from short reads

BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Herui, Ji, Yongxin, Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433603/
https://www.ncbi.nlm.nih.gov/pubmed/37587527
http://dx.doi.org/10.1186/s40168-023-01615-w
_version_ 1785091685856837632
author Liao, Herui
Ji, Yongxin
Sun, Yanni
author_facet Liao, Herui
Ji, Yongxin
Sun, Yanni
author_sort Liao, Herui
collection PubMed
description BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS: In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS: By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01615-w.
format Online
Article
Text
id pubmed-10433603
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104336032023-08-18 High-resolution strain-level microbiome composition analysis from short reads Liao, Herui Ji, Yongxin Sun, Yanni Microbiome Methodology BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS: In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS: By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01615-w. BioMed Central 2023-08-17 /pmc/articles/PMC10433603/ /pubmed/37587527 http://dx.doi.org/10.1186/s40168-023-01615-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Liao, Herui
Ji, Yongxin
Sun, Yanni
High-resolution strain-level microbiome composition analysis from short reads
title High-resolution strain-level microbiome composition analysis from short reads
title_full High-resolution strain-level microbiome composition analysis from short reads
title_fullStr High-resolution strain-level microbiome composition analysis from short reads
title_full_unstemmed High-resolution strain-level microbiome composition analysis from short reads
title_short High-resolution strain-level microbiome composition analysis from short reads
title_sort high-resolution strain-level microbiome composition analysis from short reads
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433603/
https://www.ncbi.nlm.nih.gov/pubmed/37587527
http://dx.doi.org/10.1186/s40168-023-01615-w
work_keys_str_mv AT liaoherui highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads
AT jiyongxin highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads
AT sunyanni highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads