Cargando…

A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential

The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limite...

Descripción completa

Detalles Bibliográficos
Autores principales: Hill, Steven T, Kuintzle, Rachael, Teegarden, Amy, Merrill, Erich, Danaee, Padideh, Hendrix, David A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144860/
https://www.ncbi.nlm.nih.gov/pubmed/29986088
http://dx.doi.org/10.1093/nar/gky567
_version_ 1783356158262640640
author Hill, Steven T
Kuintzle, Rachael
Teegarden, Amy
Merrill, Erich
Danaee, Padideh
Hendrix, David A
author_facet Hill, Steven T
Kuintzle, Rachael
Teegarden, Amy
Merrill, Erich
Danaee, Padideh
Hendrix, David A
author_sort Hill, Steven T
collection PubMed
description The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data.
format Online
Article
Text
id pubmed-6144860
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61448602018-09-25 A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential Hill, Steven T Kuintzle, Rachael Teegarden, Amy Merrill, Erich Danaee, Padideh Hendrix, David A Nucleic Acids Res Computational Biology The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data. Oxford University Press 2018-09-19 2018-07-09 /pmc/articles/PMC6144860/ /pubmed/29986088 http://dx.doi.org/10.1093/nar/gky567 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Hill, Steven T
Kuintzle, Rachael
Teegarden, Amy
Merrill, Erich
Danaee, Padideh
Hendrix, David A
A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title_full A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title_fullStr A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title_full_unstemmed A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title_short A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
title_sort deep recurrent neural network discovers complex biological rules to decipher rna protein-coding potential
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144860/
https://www.ncbi.nlm.nih.gov/pubmed/29986088
http://dx.doi.org/10.1093/nar/gky567
work_keys_str_mv AT hillstevent adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT kuintzlerachael adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT teegardenamy adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT merrillerich adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT danaeepadideh adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT hendrixdavida adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT hillstevent deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT kuintzlerachael deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT teegardenamy deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT merrillerich deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT danaeepadideh deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential
AT hendrixdavida deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential