Cargando…
StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide
Motivation: Microbial metagenomic profiling software and databases are advancing rapidly for development of novel disease biomarkers and therapeutics yet three problems impede analyses: 1) the conflation of “genome assembly” and “strain” in reference databases; 2) difficulty connecting DNA biomarker...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9939595/ https://www.ncbi.nlm.nih.gov/pubmed/36814618 http://dx.doi.org/10.1016/j.heliyon.2023.e13314 |
_version_ | 1784890889389211648 |
---|---|
author | DeSantis, Todd Z. Cardona, Cesar Narayan, Nicole R. Viswanatham, Satish Ravichandar, Divya Wee, Brendan Chow, Cheryl-Emiliane Iwai, Shoko |
author_facet | DeSantis, Todd Z. Cardona, Cesar Narayan, Nicole R. Viswanatham, Satish Ravichandar, Divya Wee, Brendan Chow, Cheryl-Emiliane Iwai, Shoko |
author_sort | DeSantis, Todd Z. |
collection | PubMed |
description | Motivation: Microbial metagenomic profiling software and databases are advancing rapidly for development of novel disease biomarkers and therapeutics yet three problems impede analyses: 1) the conflation of “genome assembly” and “strain” in reference databases; 2) difficulty connecting DNA biomarkers to a procurable strain for laboratory experimentation; and 3) absence of a comprehensive and unified strain-resolved reference database for integrating both shotgun metagenomics and 16S rRNA gene data. Results: We demarcated 681,087 strains, the largest collection of its kind, by filtering public data into a knowledge graph of vertices representing contiguous DNA sequences, genome assemblies, strain monikers and bio-resource center (BRC) catalog numbers then adding inter-vertex edges only for synonyms or direct derivatives. Surprisingly, for 10,043 important strains, we found replicate RefSeq genome assemblies obstructing interpretation of database searches. We organized each strain into eight taxonomic ranks with bootstrap confidence inversely correlated with genome assembly contamination. The StrainSelect database is suited for applications where a taxonomic, functional or procurement reference is needed for shotgun or amplicon metagenomics since 636,568 strains have at least one 16S rRNA gene, 245,005 have at least one annotated genome assembly, and 36,671 are procurable from at least one BRC. The database overcomes all three aforementioned problems since it disambiguates strains from assemblies, locates strains at BRCs, and unifies a taxonomic reference for both 16S rRNA and shotgun metagenomics. Availability: The StrainSelect database is available in igraph and tabular vertex-edge formats compatible with Neo4J. Dereplicated MinHash and fasta databases are distributed for sourmash and usearch pipelines at http://strainselect.secondgenome.com. Contact:todd.desantis@gmail.com. Supplementary information: Supplementary data are available online. |
format | Online Article Text |
id | pubmed-9939595 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-99395952023-02-21 StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide DeSantis, Todd Z. Cardona, Cesar Narayan, Nicole R. Viswanatham, Satish Ravichandar, Divya Wee, Brendan Chow, Cheryl-Emiliane Iwai, Shoko Heliyon Research Article Motivation: Microbial metagenomic profiling software and databases are advancing rapidly for development of novel disease biomarkers and therapeutics yet three problems impede analyses: 1) the conflation of “genome assembly” and “strain” in reference databases; 2) difficulty connecting DNA biomarkers to a procurable strain for laboratory experimentation; and 3) absence of a comprehensive and unified strain-resolved reference database for integrating both shotgun metagenomics and 16S rRNA gene data. Results: We demarcated 681,087 strains, the largest collection of its kind, by filtering public data into a knowledge graph of vertices representing contiguous DNA sequences, genome assemblies, strain monikers and bio-resource center (BRC) catalog numbers then adding inter-vertex edges only for synonyms or direct derivatives. Surprisingly, for 10,043 important strains, we found replicate RefSeq genome assemblies obstructing interpretation of database searches. We organized each strain into eight taxonomic ranks with bootstrap confidence inversely correlated with genome assembly contamination. The StrainSelect database is suited for applications where a taxonomic, functional or procurement reference is needed for shotgun or amplicon metagenomics since 636,568 strains have at least one 16S rRNA gene, 245,005 have at least one annotated genome assembly, and 36,671 are procurable from at least one BRC. The database overcomes all three aforementioned problems since it disambiguates strains from assemblies, locates strains at BRCs, and unifies a taxonomic reference for both 16S rRNA and shotgun metagenomics. Availability: The StrainSelect database is available in igraph and tabular vertex-edge formats compatible with Neo4J. Dereplicated MinHash and fasta databases are distributed for sourmash and usearch pipelines at http://strainselect.secondgenome.com. Contact:todd.desantis@gmail.com. Supplementary information: Supplementary data are available online. Elsevier 2023-02-04 /pmc/articles/PMC9939595/ /pubmed/36814618 http://dx.doi.org/10.1016/j.heliyon.2023.e13314 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article DeSantis, Todd Z. Cardona, Cesar Narayan, Nicole R. Viswanatham, Satish Ravichandar, Divya Wee, Brendan Chow, Cheryl-Emiliane Iwai, Shoko StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title | StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title_full | StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title_fullStr | StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title_full_unstemmed | StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title_short | StrainSelect: A novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
title_sort | strainselect: a novel microbiome reference database that disambiguates all bacterial strains, genome assemblies and extant cultures worldwide |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9939595/ https://www.ncbi.nlm.nih.gov/pubmed/36814618 http://dx.doi.org/10.1016/j.heliyon.2023.e13314 |
work_keys_str_mv | AT desantistoddz strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT cardonacesar strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT narayannicoler strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT viswanathamsatish strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT ravichandardivya strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT weebrendan strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT chowcherylemiliane strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide AT iwaishoko strainselectanovelmicrobiomereferencedatabasethatdisambiguatesallbacterialstrainsgenomeassembliesandextantculturesworldwide |