Cargando…

PARROT is a flexible recurrent neural network framework for analysis of large protein datasets

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine lea...

Descripción completa

Detalles Bibliográficos
Autores principales: Griffith, Daniel, Holehouse, Alex S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448528/
https://www.ncbi.nlm.nih.gov/pubmed/34533455
http://dx.doi.org/10.7554/eLife.70576
_version_ 1784569255777271808
author Griffith, Daniel
Holehouse, Alex S
author_facet Griffith, Daniel
Holehouse, Alex S
author_sort Griffith, Daniel
collection PubMed
description The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
format Online
Article
Text
id pubmed-8448528
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-84485282021-09-20 PARROT is a flexible recurrent neural network framework for analysis of large protein datasets Griffith, Daniel Holehouse, Alex S eLife Computational and Systems Biology The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems. eLife Sciences Publications, Ltd 2021-09-17 /pmc/articles/PMC8448528/ /pubmed/34533455 http://dx.doi.org/10.7554/eLife.70576 Text en © 2021, Griffith and Holehouse https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Griffith, Daniel
Holehouse, Alex S
PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_full PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_fullStr PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_full_unstemmed PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_short PARROT is a flexible recurrent neural network framework for analysis of large protein datasets
title_sort parrot is a flexible recurrent neural network framework for analysis of large protein datasets
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448528/
https://www.ncbi.nlm.nih.gov/pubmed/34533455
http://dx.doi.org/10.7554/eLife.70576
work_keys_str_mv AT griffithdaniel parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets
AT holehousealexs parrotisaflexiblerecurrentneuralnetworkframeworkforanalysisoflargeproteindatasets