Cargando…

Improvement of eukaryotic protein predictions from soil metagenomes

During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs...

Descripción completa

Detalles Bibliográficos
Autores principales: Belliardo, Carole, Koutsovoulos, Georgios D., Rancurel, Corinne, Clément, Mathilde, Lipuma, Justine, Bailly-Bechet, Marc, Danchin, Etienne G. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9203802/
https://www.ncbi.nlm.nih.gov/pubmed/35710557
http://dx.doi.org/10.1038/s41597-022-01420-4
_version_ 1784728780485427200
author Belliardo, Carole
Koutsovoulos, Georgios D.
Rancurel, Corinne
Clément, Mathilde
Lipuma, Justine
Bailly-Bechet, Marc
Danchin, Etienne G. J.
author_facet Belliardo, Carole
Koutsovoulos, Georgios D.
Rancurel, Corinne
Clément, Mathilde
Lipuma, Justine
Bailly-Bechet, Marc
Danchin, Etienne G. J.
author_sort Belliardo, Carole
collection PubMed
description During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs with intrinsically different gene features, are not optimally annotated. Using a bioinformatics pipeline, we have filtered 7.9 billion contigs from 6,872 soil metagenomes in the JGI’s IMG/M database to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins and over 300,000 orphan proteins lacking homology in public databases. Comparing the gene predictions we made with initial JGI ones on the same contigs, we confirmed our pipeline improves eukaryotic proteins completeness and contiguity in soil metagenomes. The improved quality of eukaryotic proteins combined with a more comprehensive assignment method yielded more reliable taxonomic annotation. This dataset of eukaryotic soil proteins with improved completeness, quality and taxonomic annotation reliability is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes.
format Online
Article
Text
id pubmed-9203802
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92038022022-06-18 Improvement of eukaryotic protein predictions from soil metagenomes Belliardo, Carole Koutsovoulos, Georgios D. Rancurel, Corinne Clément, Mathilde Lipuma, Justine Bailly-Bechet, Marc Danchin, Etienne G. J. Sci Data Data Descriptor During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs with intrinsically different gene features, are not optimally annotated. Using a bioinformatics pipeline, we have filtered 7.9 billion contigs from 6,872 soil metagenomes in the JGI’s IMG/M database to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins and over 300,000 orphan proteins lacking homology in public databases. Comparing the gene predictions we made with initial JGI ones on the same contigs, we confirmed our pipeline improves eukaryotic proteins completeness and contiguity in soil metagenomes. The improved quality of eukaryotic proteins combined with a more comprehensive assignment method yielded more reliable taxonomic annotation. This dataset of eukaryotic soil proteins with improved completeness, quality and taxonomic annotation reliability is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes. Nature Publishing Group UK 2022-06-16 /pmc/articles/PMC9203802/ /pubmed/35710557 http://dx.doi.org/10.1038/s41597-022-01420-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Belliardo, Carole
Koutsovoulos, Georgios D.
Rancurel, Corinne
Clément, Mathilde
Lipuma, Justine
Bailly-Bechet, Marc
Danchin, Etienne G. J.
Improvement of eukaryotic protein predictions from soil metagenomes
title Improvement of eukaryotic protein predictions from soil metagenomes
title_full Improvement of eukaryotic protein predictions from soil metagenomes
title_fullStr Improvement of eukaryotic protein predictions from soil metagenomes
title_full_unstemmed Improvement of eukaryotic protein predictions from soil metagenomes
title_short Improvement of eukaryotic protein predictions from soil metagenomes
title_sort improvement of eukaryotic protein predictions from soil metagenomes
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9203802/
https://www.ncbi.nlm.nih.gov/pubmed/35710557
http://dx.doi.org/10.1038/s41597-022-01420-4
work_keys_str_mv AT belliardocarole improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT koutsovoulosgeorgiosd improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT rancurelcorinne improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT clementmathilde improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT lipumajustine improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT baillybechetmarc improvementofeukaryoticproteinpredictionsfromsoilmetagenomes
AT danchinetiennegj improvementofeukaryoticproteinpredictionsfromsoilmetagenomes