Cargando…

Identification of representative species-specific genes for abundance measurements

MOTIVATION: Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Zachariasen, Trine, Petersen, Anders Østergaard, Brejnrod, Asker, Vestergaard, Gisle Alberg, Eklund, Aron, Nielsen, Henrik Bjørn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199311/
https://www.ncbi.nlm.nih.gov/pubmed/37213867
http://dx.doi.org/10.1093/bioadv/vbad060
_version_ 1785044905836412928
author Zachariasen, Trine
Petersen, Anders Østergaard
Brejnrod, Asker
Vestergaard, Gisle Alberg
Eklund, Aron
Nielsen, Henrik Bjørn
author_facet Zachariasen, Trine
Petersen, Anders Østergaard
Brejnrod, Asker
Vestergaard, Gisle Alberg
Eklund, Aron
Nielsen, Henrik Bjørn
author_sort Zachariasen, Trine
collection PubMed
description MOTIVATION: Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relative abundance and used as markers of each metagenomic species with high accuracy. RESULTS: An initial set of the 100 genes that correlate with the median gene abundance profile of the entity is selected. A variant of the coupon collector’s problem was utilized to evaluate the probability of identifying a certain number of unique genes in a sample. This allows us to reject the abundance measurements of strains exhibiting a significantly skewed gene representation. A rank-based negative binomial model is employed to assess the performance of different gene sets across a large set of samples, facilitating identification of an optimal signature gene set for the entity. When benchmarked the method on a synthetic gene catalog, our optimized signature gene sets estimate relative abundance significantly closer to the true relative abundance compared to the starting gene sets extracted from the metagenomic species. The method was able to replicate results from a study with real data and identify around three times as many metagenomic entities. AVAILABILITY AND IMPLEMENTATION: The code used for the analysis is available on GitHub: https://github.com/trinezac/SG_optimization. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10199311
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101993112023-05-21 Identification of representative species-specific genes for abundance measurements Zachariasen, Trine Petersen, Anders Østergaard Brejnrod, Asker Vestergaard, Gisle Alberg Eklund, Aron Nielsen, Henrik Bjørn Bioinform Adv Original Paper MOTIVATION: Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relative abundance and used as markers of each metagenomic species with high accuracy. RESULTS: An initial set of the 100 genes that correlate with the median gene abundance profile of the entity is selected. A variant of the coupon collector’s problem was utilized to evaluate the probability of identifying a certain number of unique genes in a sample. This allows us to reject the abundance measurements of strains exhibiting a significantly skewed gene representation. A rank-based negative binomial model is employed to assess the performance of different gene sets across a large set of samples, facilitating identification of an optimal signature gene set for the entity. When benchmarked the method on a synthetic gene catalog, our optimized signature gene sets estimate relative abundance significantly closer to the true relative abundance compared to the starting gene sets extracted from the metagenomic species. The method was able to replicate results from a study with real data and identify around three times as many metagenomic entities. AVAILABILITY AND IMPLEMENTATION: The code used for the analysis is available on GitHub: https://github.com/trinezac/SG_optimization. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-05-08 /pmc/articles/PMC10199311/ /pubmed/37213867 http://dx.doi.org/10.1093/bioadv/vbad060 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Zachariasen, Trine
Petersen, Anders Østergaard
Brejnrod, Asker
Vestergaard, Gisle Alberg
Eklund, Aron
Nielsen, Henrik Bjørn
Identification of representative species-specific genes for abundance measurements
title Identification of representative species-specific genes for abundance measurements
title_full Identification of representative species-specific genes for abundance measurements
title_fullStr Identification of representative species-specific genes for abundance measurements
title_full_unstemmed Identification of representative species-specific genes for abundance measurements
title_short Identification of representative species-specific genes for abundance measurements
title_sort identification of representative species-specific genes for abundance measurements
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10199311/
https://www.ncbi.nlm.nih.gov/pubmed/37213867
http://dx.doi.org/10.1093/bioadv/vbad060
work_keys_str_mv AT zachariasentrine identificationofrepresentativespeciesspecificgenesforabundancemeasurements
AT petersenandersøstergaard identificationofrepresentativespeciesspecificgenesforabundancemeasurements
AT brejnrodasker identificationofrepresentativespeciesspecificgenesforabundancemeasurements
AT vestergaardgislealberg identificationofrepresentativespeciesspecificgenesforabundancemeasurements
AT eklundaron identificationofrepresentativespeciesspecificgenesforabundancemeasurements
AT nielsenhenrikbjørn identificationofrepresentativespeciesspecificgenesforabundancemeasurements