Cargando…

Strain/species identification in metagenomes using genome-specific markers

Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a nove...

Descripción completa

Detalles Bibliográficos
Autores principales: Tu, Qichao, He, Zhili, Zhou, Jizhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005670/
https://www.ncbi.nlm.nih.gov/pubmed/24523352
http://dx.doi.org/10.1093/nar/gku138
_version_ 1782314138070941696
author Tu, Qichao
He, Zhili
Zhou, Jizhong
author_facet Tu, Qichao
He, Zhili
Zhou, Jizhong
author_sort Tu, Qichao
collection PubMed
description Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
format Online
Article
Text
id pubmed-4005670
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40056702014-05-01 Strain/species identification in metagenomes using genome-specific markers Tu, Qichao He, Zhili Zhou, Jizhong Nucleic Acids Res Methods Online Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing. Oxford University Press 2014-04 2014-02-12 /pmc/articles/PMC4005670/ /pubmed/24523352 http://dx.doi.org/10.1093/nar/gku138 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Tu, Qichao
He, Zhili
Zhou, Jizhong
Strain/species identification in metagenomes using genome-specific markers
title Strain/species identification in metagenomes using genome-specific markers
title_full Strain/species identification in metagenomes using genome-specific markers
title_fullStr Strain/species identification in metagenomes using genome-specific markers
title_full_unstemmed Strain/species identification in metagenomes using genome-specific markers
title_short Strain/species identification in metagenomes using genome-specific markers
title_sort strain/species identification in metagenomes using genome-specific markers
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005670/
https://www.ncbi.nlm.nih.gov/pubmed/24523352
http://dx.doi.org/10.1093/nar/gku138
work_keys_str_mv AT tuqichao strainspeciesidentificationinmetagenomesusinggenomespecificmarkers
AT hezhili strainspeciesidentificationinmetagenomesusinggenomespecificmarkers
AT zhoujizhong strainspeciesidentificationinmetagenomesusinggenomespecificmarkers