Cargando…

Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning

MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely un...

Descripción completa

Detalles Bibliográficos
Autores principales: Stiehler, Felix, Steinborn, Marvin, Scholz, Stephan, Dey, Daniela, Weber, Andreas P M, Denton, Alisandra K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016489/
https://www.ncbi.nlm.nih.gov/pubmed/33325516
http://dx.doi.org/10.1093/bioinformatics/btaa1044
_version_ 1783673870384889856
author Stiehler, Felix
Steinborn, Marvin
Scholz, Stephan
Dey, Daniela
Weber, Andreas P M
Denton, Alisandra K
author_facet Stiehler, Felix
Steinborn, Marvin
Scholz, Stephan
Dey, Daniela
Weber, Andreas P M
Denton, Alisandra K
author_sort Stiehler, Felix
collection PubMed
description MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture. RESULTS: We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities. AVAILABILITY AND IMPLEMENTATION: The source code of this work is available at https://github.com/weberlab-hhu/Helixer under the GNU General Public License v3.0. The trained models are available at https://doi.org/10.5281/zenodo.3974409 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8016489
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80164892021-04-07 Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning Stiehler, Felix Steinborn, Marvin Scholz, Stephan Dey, Daniela Weber, Andreas P M Denton, Alisandra K Bioinformatics Original Papers MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture. RESULTS: We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities. AVAILABILITY AND IMPLEMENTATION: The source code of this work is available at https://github.com/weberlab-hhu/Helixer under the GNU General Public License v3.0. The trained models are available at https://doi.org/10.5281/zenodo.3974409 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-16 /pmc/articles/PMC8016489/ /pubmed/33325516 http://dx.doi.org/10.1093/bioinformatics/btaa1044 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Stiehler, Felix
Steinborn, Marvin
Scholz, Stephan
Dey, Daniela
Weber, Andreas P M
Denton, Alisandra K
Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title_full Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title_fullStr Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title_full_unstemmed Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title_short Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
title_sort helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016489/
https://www.ncbi.nlm.nih.gov/pubmed/33325516
http://dx.doi.org/10.1093/bioinformatics/btaa1044
work_keys_str_mv AT stiehlerfelix helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning
AT steinbornmarvin helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning
AT scholzstephan helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning
AT deydaniela helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning
AT weberandreaspm helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning
AT dentonalisandrak helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning