Cargando…

Tuning intrinsic disorder predictors for virus proteins

Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Almog, Gal, Olabode, Abayomi S, Poon, Art F Y
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7882063/
https://www.ncbi.nlm.nih.gov/pubmed/33614158
http://dx.doi.org/10.1093/ve/veaa106
_version_ 1783650987726077952
author Almog, Gal
Olabode, Abayomi S
Poon, Art F Y
author_facet Almog, Gal
Olabode, Abayomi S
Poon, Art F Y
author_sort Almog, Gal
collection PubMed
description Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.
format Online
Article
Text
id pubmed-7882063
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78820632021-02-18 Tuning intrinsic disorder predictors for virus proteins Almog, Gal Olabode, Abayomi S Poon, Art F Y Virus Evol Research Article Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions. Oxford University Press 2021-01-25 /pmc/articles/PMC7882063/ /pubmed/33614158 http://dx.doi.org/10.1093/ve/veaa106 Text en © The Author(s) 2021. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Almog, Gal
Olabode, Abayomi S
Poon, Art F Y
Tuning intrinsic disorder predictors for virus proteins
title Tuning intrinsic disorder predictors for virus proteins
title_full Tuning intrinsic disorder predictors for virus proteins
title_fullStr Tuning intrinsic disorder predictors for virus proteins
title_full_unstemmed Tuning intrinsic disorder predictors for virus proteins
title_short Tuning intrinsic disorder predictors for virus proteins
title_sort tuning intrinsic disorder predictors for virus proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7882063/
https://www.ncbi.nlm.nih.gov/pubmed/33614158
http://dx.doi.org/10.1093/ve/veaa106
work_keys_str_mv AT almoggal tuningintrinsicdisorderpredictorsforvirusproteins
AT olabodeabayomis tuningintrinsicdisorderpredictorsforvirusproteins
AT poonartfy tuningintrinsicdisorderpredictorsforvirusproteins