Cargando…
High-resolution strain-level microbiome composition analysis from short reads
BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433603/ https://www.ncbi.nlm.nih.gov/pubmed/37587527 http://dx.doi.org/10.1186/s40168-023-01615-w |
_version_ | 1785091685856837632 |
---|---|
author | Liao, Herui Ji, Yongxin Sun, Yanni |
author_facet | Liao, Herui Ji, Yongxin Sun, Yanni |
author_sort | Liao, Herui |
collection | PubMed |
description | BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS: In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS: By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01615-w. |
format | Online Article Text |
id | pubmed-10433603 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104336032023-08-18 High-resolution strain-level microbiome composition analysis from short reads Liao, Herui Ji, Yongxin Sun, Yanni Microbiome Methodology BACKGROUND: Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS: In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS: By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01615-w. BioMed Central 2023-08-17 /pmc/articles/PMC10433603/ /pubmed/37587527 http://dx.doi.org/10.1186/s40168-023-01615-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Liao, Herui Ji, Yongxin Sun, Yanni High-resolution strain-level microbiome composition analysis from short reads |
title | High-resolution strain-level microbiome composition analysis from short reads |
title_full | High-resolution strain-level microbiome composition analysis from short reads |
title_fullStr | High-resolution strain-level microbiome composition analysis from short reads |
title_full_unstemmed | High-resolution strain-level microbiome composition analysis from short reads |
title_short | High-resolution strain-level microbiome composition analysis from short reads |
title_sort | high-resolution strain-level microbiome composition analysis from short reads |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433603/ https://www.ncbi.nlm.nih.gov/pubmed/37587527 http://dx.doi.org/10.1186/s40168-023-01615-w |
work_keys_str_mv | AT liaoherui highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads AT jiyongxin highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads AT sunyanni highresolutionstrainlevelmicrobiomecompositionanalysisfromshortreads |