Cargando…
Trait biases in microbial reference genomes
Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these biases have not been systematically examined. To address this gap, here we use 116,884 publicly available metagenome-assembled genomes...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911409/ https://www.ncbi.nlm.nih.gov/pubmed/36759614 http://dx.doi.org/10.1038/s41597-023-01994-7 |
_version_ | 1784884982073786368 |
---|---|
author | Albright, Sage Louca, Stilianos |
author_facet | Albright, Sage Louca, Stilianos |
author_sort | Albright, Sage |
collection | PubMed |
description | Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these biases have not been systematically examined. To address this gap, here we use 116,884 publicly available metagenome-assembled genomes (MAGs, completeness ≥80%) from 203 surveys worldwide as a culture-independent sample of bacterial and archaeal diversity, and compare these MAGs to the popular RefSeq genome database, which heavily relies on cultures. We compare the distribution of 12,454 KEGG gene orthologs (used as trait proxies) in the MAGs and RefSeq genomes, while controlling for environment type (ocean, soil, lake, bioreactor, human, and other animals). Using statistical modeling, we then determine the conditional probabilities that a species is represented in RefSeq depending on its genetic repertoire. We find that the majority of examined genes are significantly biased for or against in RefSeq. Our systematic estimates of gene prevalences across bacteria and archaea in nature and gene-specific biases in reference genomes constitutes a resource for addressing these issues in the future. |
format | Online Article Text |
id | pubmed-9911409 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99114092023-02-11 Trait biases in microbial reference genomes Albright, Sage Louca, Stilianos Sci Data Analysis Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these biases have not been systematically examined. To address this gap, here we use 116,884 publicly available metagenome-assembled genomes (MAGs, completeness ≥80%) from 203 surveys worldwide as a culture-independent sample of bacterial and archaeal diversity, and compare these MAGs to the popular RefSeq genome database, which heavily relies on cultures. We compare the distribution of 12,454 KEGG gene orthologs (used as trait proxies) in the MAGs and RefSeq genomes, while controlling for environment type (ocean, soil, lake, bioreactor, human, and other animals). Using statistical modeling, we then determine the conditional probabilities that a species is represented in RefSeq depending on its genetic repertoire. We find that the majority of examined genes are significantly biased for or against in RefSeq. Our systematic estimates of gene prevalences across bacteria and archaea in nature and gene-specific biases in reference genomes constitutes a resource for addressing these issues in the future. Nature Publishing Group UK 2023-02-09 /pmc/articles/PMC9911409/ /pubmed/36759614 http://dx.doi.org/10.1038/s41597-023-01994-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Analysis Albright, Sage Louca, Stilianos Trait biases in microbial reference genomes |
title | Trait biases in microbial reference genomes |
title_full | Trait biases in microbial reference genomes |
title_fullStr | Trait biases in microbial reference genomes |
title_full_unstemmed | Trait biases in microbial reference genomes |
title_short | Trait biases in microbial reference genomes |
title_sort | trait biases in microbial reference genomes |
topic | Analysis |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911409/ https://www.ncbi.nlm.nih.gov/pubmed/36759614 http://dx.doi.org/10.1038/s41597-023-01994-7 |
work_keys_str_mv | AT albrightsage traitbiasesinmicrobialreferencegenomes AT loucastilianos traitbiasesinmicrobialreferencegenomes |