Cargando…
Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning
MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely un...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016489/ https://www.ncbi.nlm.nih.gov/pubmed/33325516 http://dx.doi.org/10.1093/bioinformatics/btaa1044 |
_version_ | 1783673870384889856 |
---|---|
author | Stiehler, Felix Steinborn, Marvin Scholz, Stephan Dey, Daniela Weber, Andreas P M Denton, Alisandra K |
author_facet | Stiehler, Felix Steinborn, Marvin Scholz, Stephan Dey, Daniela Weber, Andreas P M Denton, Alisandra K |
author_sort | Stiehler, Felix |
collection | PubMed |
description | MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture. RESULTS: We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities. AVAILABILITY AND IMPLEMENTATION: The source code of this work is available at https://github.com/weberlab-hhu/Helixer under the GNU General Public License v3.0. The trained models are available at https://doi.org/10.5281/zenodo.3974409 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8016489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80164892021-04-07 Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning Stiehler, Felix Steinborn, Marvin Scholz, Stephan Dey, Daniela Weber, Andreas P M Denton, Alisandra K Bioinformatics Original Papers MOTIVATION: Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture. RESULTS: We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities. AVAILABILITY AND IMPLEMENTATION: The source code of this work is available at https://github.com/weberlab-hhu/Helixer under the GNU General Public License v3.0. The trained models are available at https://doi.org/10.5281/zenodo.3974409 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-16 /pmc/articles/PMC8016489/ /pubmed/33325516 http://dx.doi.org/10.1093/bioinformatics/btaa1044 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Stiehler, Felix Steinborn, Marvin Scholz, Stephan Dey, Daniela Weber, Andreas P M Denton, Alisandra K Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title | Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title_full | Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title_fullStr | Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title_full_unstemmed | Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title_short | Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
title_sort | helixer: cross-species gene annotation of large eukaryotic genomes using deep learning |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016489/ https://www.ncbi.nlm.nih.gov/pubmed/33325516 http://dx.doi.org/10.1093/bioinformatics/btaa1044 |
work_keys_str_mv | AT stiehlerfelix helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning AT steinbornmarvin helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning AT scholzstephan helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning AT deydaniela helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning AT weberandreaspm helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning AT dentonalisandrak helixercrossspeciesgeneannotationoflargeeukaryoticgenomesusingdeeplearning |