Cargando…

MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics

BACKGROUND: Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play...

Descripción completa

Detalles Bibliográficos
Autores principales: Levy Karin, Eli, Mirdita, Milot, Söding, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126354/
https://www.ncbi.nlm.nih.gov/pubmed/32245390
http://dx.doi.org/10.1186/s40168-020-00808-x
_version_ 1783516128188825600
author Levy Karin, Eli
Mirdita, Milot
Söding, Johannes
author_facet Levy Karin, Eli
Mirdita, Milot
Söding, Johannes
author_sort Levy Karin, Eli
collection PubMed
description BACKGROUND: Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. RESULTS: MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk’s power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups. CONCLUSION: The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation.
format Online
Article
Text
id pubmed-7126354
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71263542020-04-10 MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics Levy Karin, Eli Mirdita, Milot Söding, Johannes Microbiome Research BACKGROUND: Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. RESULTS: MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk’s power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups. CONCLUSION: The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation. BioMed Central 2020-04-03 /pmc/articles/PMC7126354/ /pubmed/32245390 http://dx.doi.org/10.1186/s40168-020-00808-x Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Levy Karin, Eli
Mirdita, Milot
Söding, Johannes
MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title_full MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title_fullStr MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title_full_unstemmed MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title_short MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
title_sort metaeuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126354/
https://www.ncbi.nlm.nih.gov/pubmed/32245390
http://dx.doi.org/10.1186/s40168-020-00808-x
work_keys_str_mv AT levykarineli metaeuksensitivehighthroughputgenediscoveryandannotationforlargescaleeukaryoticmetagenomics
AT mirditamilot metaeuksensitivehighthroughputgenediscoveryandannotationforlargescaleeukaryoticmetagenomics
AT sodingjohannes metaeuksensitivehighthroughputgenediscoveryandannotationforlargescaleeukaryoticmetagenomics