Cargando…
Tiara: deep learning-based classification system for eukaryotic sequences
MOTIVATION: With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity. RESULTS: We developed Tia...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8722755/ https://www.ncbi.nlm.nih.gov/pubmed/34570171 http://dx.doi.org/10.1093/bioinformatics/btab672 |
_version_ | 1784625581694910464 |
---|---|
author | Karlicki, Michał Antonowicz, Stanisław Karnkowska, Anna |
author_facet | Karlicki, Michał Antonowicz, Stanisław Karnkowska, Anna |
author_sort | Karlicki, Michał |
collection | PubMed |
description | MOTIVATION: With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity. RESULTS: We developed Tiara, a deep-learning-based approach for the identification of eukaryotic sequences in the metagenomic datasets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic fractions and subsequently divides organellar sequences into plastidial and mitochondrial. Using the test dataset, we have shown that Tiara performed similarly to EukRep for prokaryotes classification and outperformed it for eukaryotes classification with lower calculation time. In the tests on the real data, Tiara performed better than EukRep in analyzing the small dataset representing eukaryotic cell microbiome and large dataset from the pelagic zone of oceans. Tiara is also the only available tool correctly classifying organellar sequences, which was confirmed by the recovery of nearly complete plastid and mitochondrial genomes from the test data and real metagenomic data. AVAILABILITY AND IMPLEMENTATION: Tiara is implemented in python 3.8, available at https://github.com/ibe-uw/tiara and tested on Unix-based systems. It is released under an open-source MIT license and documentation is available at https://ibe-uw.github.io/tiara. Version 1.0.1 of Tiara has been used for all benchmarks. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8722755 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-87227552022-01-05 Tiara: deep learning-based classification system for eukaryotic sequences Karlicki, Michał Antonowicz, Stanisław Karnkowska, Anna Bioinformatics Original Paper MOTIVATION: With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity. RESULTS: We developed Tiara, a deep-learning-based approach for the identification of eukaryotic sequences in the metagenomic datasets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic fractions and subsequently divides organellar sequences into plastidial and mitochondrial. Using the test dataset, we have shown that Tiara performed similarly to EukRep for prokaryotes classification and outperformed it for eukaryotes classification with lower calculation time. In the tests on the real data, Tiara performed better than EukRep in analyzing the small dataset representing eukaryotic cell microbiome and large dataset from the pelagic zone of oceans. Tiara is also the only available tool correctly classifying organellar sequences, which was confirmed by the recovery of nearly complete plastid and mitochondrial genomes from the test data and real metagenomic data. AVAILABILITY AND IMPLEMENTATION: Tiara is implemented in python 3.8, available at https://github.com/ibe-uw/tiara and tested on Unix-based systems. It is released under an open-source MIT license and documentation is available at https://ibe-uw.github.io/tiara. Version 1.0.1 of Tiara has been used for all benchmarks. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-09-27 /pmc/articles/PMC8722755/ /pubmed/34570171 http://dx.doi.org/10.1093/bioinformatics/btab672 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Paper Karlicki, Michał Antonowicz, Stanisław Karnkowska, Anna Tiara: deep learning-based classification system for eukaryotic sequences |
title | Tiara: deep learning-based classification system for eukaryotic sequences |
title_full | Tiara: deep learning-based classification system for eukaryotic sequences |
title_fullStr | Tiara: deep learning-based classification system for eukaryotic sequences |
title_full_unstemmed | Tiara: deep learning-based classification system for eukaryotic sequences |
title_short | Tiara: deep learning-based classification system for eukaryotic sequences |
title_sort | tiara: deep learning-based classification system for eukaryotic sequences |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8722755/ https://www.ncbi.nlm.nih.gov/pubmed/34570171 http://dx.doi.org/10.1093/bioinformatics/btab672 |
work_keys_str_mv | AT karlickimichał tiaradeeplearningbasedclassificationsystemforeukaryoticsequences AT antonowiczstanisław tiaradeeplearningbasedclassificationsystemforeukaryoticsequences AT karnkowskaanna tiaradeeplearningbasedclassificationsystemforeukaryoticsequences |