Cargando…

ODNA: identification of organellar DNA by machine learning

MOTIVATION: Identifying organellar DNA, such as mitochondrial or plastid sequences, inside a whole genome assembly, remains challenging and requires biological background knowledge. To address this, we developed ODNA based on genome annotation and machine learning to fulfill. RESULTS: ODNA is a soft...

Descripción completa

Detalles Bibliográficos
Autores principales: Martin, Roman, Nguyen, Minh Kien, Lowack, Nick, Heider, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10229373/
https://www.ncbi.nlm.nih.gov/pubmed/37195463
http://dx.doi.org/10.1093/bioinformatics/btad326
_version_ 1785051233134837760
author Martin, Roman
Nguyen, Minh Kien
Lowack, Nick
Heider, Dominik
author_facet Martin, Roman
Nguyen, Minh Kien
Lowack, Nick
Heider, Dominik
author_sort Martin, Roman
collection PubMed
description MOTIVATION: Identifying organellar DNA, such as mitochondrial or plastid sequences, inside a whole genome assembly, remains challenging and requires biological background knowledge. To address this, we developed ODNA based on genome annotation and machine learning to fulfill. RESULTS: ODNA is a software that classifies organellar DNA sequences within a genome assembly by machine learning based on a predefined genome annotation workflow. We trained our model with 829 769 DNA sequences from 405 genome assemblies and achieved high predictive performance (e.g. matthew's correlation coefficient of 0.61 for mitochondria and 0.73 for chloroplasts) on independent validation data, thus outperforming existing approaches significantly. AVAILABILITY AND IMPLEMENTATION: Our software ODNA is freely accessible as a web service at https://odna.mathematik.uni-marburg.de and can also be run in a docker container. The source code can be found at https://gitlab.com/mosga/odna and the processed data at Zenodo (DOI: 10.5281/zenodo.7506483).
format Online
Article
Text
id pubmed-10229373
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102293732023-06-01 ODNA: identification of organellar DNA by machine learning Martin, Roman Nguyen, Minh Kien Lowack, Nick Heider, Dominik Bioinformatics Applications Note MOTIVATION: Identifying organellar DNA, such as mitochondrial or plastid sequences, inside a whole genome assembly, remains challenging and requires biological background knowledge. To address this, we developed ODNA based on genome annotation and machine learning to fulfill. RESULTS: ODNA is a software that classifies organellar DNA sequences within a genome assembly by machine learning based on a predefined genome annotation workflow. We trained our model with 829 769 DNA sequences from 405 genome assemblies and achieved high predictive performance (e.g. matthew's correlation coefficient of 0.61 for mitochondria and 0.73 for chloroplasts) on independent validation data, thus outperforming existing approaches significantly. AVAILABILITY AND IMPLEMENTATION: Our software ODNA is freely accessible as a web service at https://odna.mathematik.uni-marburg.de and can also be run in a docker container. The source code can be found at https://gitlab.com/mosga/odna and the processed data at Zenodo (DOI: 10.5281/zenodo.7506483). Oxford University Press 2023-05-17 /pmc/articles/PMC10229373/ /pubmed/37195463 http://dx.doi.org/10.1093/bioinformatics/btad326 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Martin, Roman
Nguyen, Minh Kien
Lowack, Nick
Heider, Dominik
ODNA: identification of organellar DNA by machine learning
title ODNA: identification of organellar DNA by machine learning
title_full ODNA: identification of organellar DNA by machine learning
title_fullStr ODNA: identification of organellar DNA by machine learning
title_full_unstemmed ODNA: identification of organellar DNA by machine learning
title_short ODNA: identification of organellar DNA by machine learning
title_sort odna: identification of organellar dna by machine learning
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10229373/
https://www.ncbi.nlm.nih.gov/pubmed/37195463
http://dx.doi.org/10.1093/bioinformatics/btad326
work_keys_str_mv AT martinroman odnaidentificationoforganellardnabymachinelearning
AT nguyenminhkien odnaidentificationoforganellardnabymachinelearning
AT lowacknick odnaidentificationoforganellardnabymachinelearning
AT heiderdominik odnaidentificationoforganellardnabymachinelearning