Cargando…
PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine lea...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
eLife Sciences Publications, Ltd
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448528/ https://www.ncbi.nlm.nih.gov/pubmed/34533455 http://dx.doi.org/10.7554/eLife.70576 |
_version_ | 1784569255777271808 |
---|---|
author | Griffith, Daniel Holehouse, Alex S |
author_facet | Griffith, Daniel Holehouse, Alex S |
author_sort | Griffith, Daniel |
collection | PubMed |
description | The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems. |
format | Online Article Text |
id | pubmed-8448528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | eLife Sciences Publications, Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-84485282021-09-20 PARROT is a flexible recurrent neural network framework for analysis of large protein datasets Griffith, Daniel Holehouse, Alex S eLife Computational and Systems Biology The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems. eLife Sciences Publications, Ltd 2021-09-17 /pmc/articles/PMC8448528/ /pubmed/34533455 http://dx.doi.org/10.7554/eLife.70576 Text en © 2021, Griffith and Holehouse https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited. |
spellingShingle | Computational and Systems Biology Griffith, Daniel Holehouse, Alex S PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title | PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title_full | PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title_fullStr | PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title_full_unstemmed | PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title_short | PARROT is a flexible recurrent neural network framework for analysis of large protein datasets |
title_sort | parrot is a flexible recurrent neural network framework for analysis of large protein datasets |
topic | Computational and Systems Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448528/ https://www.ncbi.nlm.nih.gov/pubmed/34533455 http://dx.doi.org/10.7554/eLife.70576 |
work_keys_str_mv | AT griffithdaniel parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets AT holehousealexs parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets |