Cargando…
Improvement of eukaryotic protein predictions from soil metagenomes
During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9203802/ https://www.ncbi.nlm.nih.gov/pubmed/35710557 http://dx.doi.org/10.1038/s41597-022-01420-4 |
_version_ | 1784728780485427200 |
---|---|
author | Belliardo, Carole Koutsovoulos, Georgios D. Rancurel, Corinne Clément, Mathilde Lipuma, Justine Bailly-Bechet, Marc Danchin, Etienne G. J. |
author_facet | Belliardo, Carole Koutsovoulos, Georgios D. Rancurel, Corinne Clément, Mathilde Lipuma, Justine Bailly-Bechet, Marc Danchin, Etienne G. J. |
author_sort | Belliardo, Carole |
collection | PubMed |
description | During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs with intrinsically different gene features, are not optimally annotated. Using a bioinformatics pipeline, we have filtered 7.9 billion contigs from 6,872 soil metagenomes in the JGI’s IMG/M database to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins and over 300,000 orphan proteins lacking homology in public databases. Comparing the gene predictions we made with initial JGI ones on the same contigs, we confirmed our pipeline improves eukaryotic proteins completeness and contiguity in soil metagenomes. The improved quality of eukaryotic proteins combined with a more comprehensive assignment method yielded more reliable taxonomic annotation. This dataset of eukaryotic soil proteins with improved completeness, quality and taxonomic annotation reliability is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes. |
format | Online Article Text |
id | pubmed-9203802 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-92038022022-06-18 Improvement of eukaryotic protein predictions from soil metagenomes Belliardo, Carole Koutsovoulos, Georgios D. Rancurel, Corinne Clément, Mathilde Lipuma, Justine Bailly-Bechet, Marc Danchin, Etienne G. J. Sci Data Data Descriptor During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs with intrinsically different gene features, are not optimally annotated. Using a bioinformatics pipeline, we have filtered 7.9 billion contigs from 6,872 soil metagenomes in the JGI’s IMG/M database to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins and over 300,000 orphan proteins lacking homology in public databases. Comparing the gene predictions we made with initial JGI ones on the same contigs, we confirmed our pipeline improves eukaryotic proteins completeness and contiguity in soil metagenomes. The improved quality of eukaryotic proteins combined with a more comprehensive assignment method yielded more reliable taxonomic annotation. This dataset of eukaryotic soil proteins with improved completeness, quality and taxonomic annotation reliability is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes. Nature Publishing Group UK 2022-06-16 /pmc/articles/PMC9203802/ /pubmed/35710557 http://dx.doi.org/10.1038/s41597-022-01420-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Belliardo, Carole Koutsovoulos, Georgios D. Rancurel, Corinne Clément, Mathilde Lipuma, Justine Bailly-Bechet, Marc Danchin, Etienne G. J. Improvement of eukaryotic protein predictions from soil metagenomes |
title | Improvement of eukaryotic protein predictions from soil metagenomes |
title_full | Improvement of eukaryotic protein predictions from soil metagenomes |
title_fullStr | Improvement of eukaryotic protein predictions from soil metagenomes |
title_full_unstemmed | Improvement of eukaryotic protein predictions from soil metagenomes |
title_short | Improvement of eukaryotic protein predictions from soil metagenomes |
title_sort | improvement of eukaryotic protein predictions from soil metagenomes |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9203802/ https://www.ncbi.nlm.nih.gov/pubmed/35710557 http://dx.doi.org/10.1038/s41597-022-01420-4 |
work_keys_str_mv | AT belliardocarole improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT koutsovoulosgeorgiosd improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT rancurelcorinne improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT clementmathilde improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT lipumajustine improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT baillybechetmarc improvementofeukaryoticproteinpredictionsfromsoilmetagenomes AT danchinetiennegj improvementofeukaryoticproteinpredictionsfromsoilmetagenomes |