Cargando…
A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential
The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limite...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144860/ https://www.ncbi.nlm.nih.gov/pubmed/29986088 http://dx.doi.org/10.1093/nar/gky567 |
_version_ | 1783356158262640640 |
---|---|
author | Hill, Steven T Kuintzle, Rachael Teegarden, Amy Merrill, Erich Danaee, Padideh Hendrix, David A |
author_facet | Hill, Steven T Kuintzle, Rachael Teegarden, Amy Merrill, Erich Danaee, Padideh Hendrix, David A |
author_sort | Hill, Steven T |
collection | PubMed |
description | The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data. |
format | Online Article Text |
id | pubmed-6144860 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61448602018-09-25 A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential Hill, Steven T Kuintzle, Rachael Teegarden, Amy Merrill, Erich Danaee, Padideh Hendrix, David A Nucleic Acids Res Computational Biology The current deluge of newly identified RNA transcripts presents a singular opportunity for improved assessment of coding potential, a cornerstone of genome annotation, and for machine-driven discovery of biological knowledge. While traditional, feature-based methods for RNA classification are limited by current scientific knowledge, deep learning methods can independently discover complex biological rules in the data de novo. We trained a gated recurrent neural network (RNN) on human messenger RNA (mRNA) and long noncoding RNA (lncRNA) sequences. Our model, mRNA RNN (mRNN), surpasses state-of-the-art methods at predicting protein-coding potential despite being trained with less data and with no prior concept of what features define mRNAs. To understand what mRNN learned, we probed the network and uncovered several context-sensitive codons highly predictive of coding potential. Our results suggest that gated RNNs can learn complex and long-range patterns in full-length human transcripts, making them ideal for performing a wide range of difficult classification tasks and, most importantly, for harvesting new biological insights from the rising flood of sequencing data. Oxford University Press 2018-09-19 2018-07-09 /pmc/articles/PMC6144860/ /pubmed/29986088 http://dx.doi.org/10.1093/nar/gky567 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Hill, Steven T Kuintzle, Rachael Teegarden, Amy Merrill, Erich Danaee, Padideh Hendrix, David A A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title | A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title_full | A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title_fullStr | A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title_full_unstemmed | A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title_short | A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential |
title_sort | deep recurrent neural network discovers complex biological rules to decipher rna protein-coding potential |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144860/ https://www.ncbi.nlm.nih.gov/pubmed/29986088 http://dx.doi.org/10.1093/nar/gky567 |
work_keys_str_mv | AT hillstevent adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT kuintzlerachael adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT teegardenamy adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT merrillerich adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT danaeepadideh adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT hendrixdavida adeeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT hillstevent deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT kuintzlerachael deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT teegardenamy deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT merrillerich deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT danaeepadideh deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential AT hendrixdavida deeprecurrentneuralnetworkdiscoverscomplexbiologicalrulestodecipherrnaproteincodingpotential |