qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids

Protein sequence‐based predictors of nucleic acid (NA)‐binding include methods that predict NA‐binding proteins and NA‐binding residues. The residue‐level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Zhonghua, Basu, Sushmita, Wu, Xuantai, Kurgan, Lukasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9798252/
https://www.ncbi.nlm.nih.gov/pubmed/36519304
http://dx.doi.org/10.1002/pro.4544
_version_ 1784860868877484032
author Wu, Zhonghua
Basu, Sushmita
Wu, Xuantai
Kurgan, Lukasz
author_facet Wu, Zhonghua
Basu, Sushmita
Wu, Xuantai
Kurgan, Lukasz
author_sort Wu, Zhonghua
collection PubMed
description Protein sequence‐based predictors of nucleic acid (NA)‐binding include methods that predict NA‐binding proteins and NA‐binding residues. The residue‐level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA‐binding residues, offering more information than the protein‐level prediction and much shorter runtime than the residue‐level tools. Our first‐of‐its‐kind content predictor, qNABpredict, relies on a small, rationally designed and fast‐to‐compute feature set that represents relevant characteristics extracted from the input sequence and a well‐parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy‐agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy‐aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low‐similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue‐level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue‐level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein–NA interactions for large protein families and proteomes.
format Online
Article
Text
id pubmed-9798252
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-97982522023-01-05 qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids Wu, Zhonghua Basu, Sushmita Wu, Xuantai Kurgan, Lukasz Protein Sci Methods and Applications Protein sequence‐based predictors of nucleic acid (NA)‐binding include methods that predict NA‐binding proteins and NA‐binding residues. The residue‐level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA‐binding residues, offering more information than the protein‐level prediction and much shorter runtime than the residue‐level tools. Our first‐of‐its‐kind content predictor, qNABpredict, relies on a small, rationally designed and fast‐to‐compute feature set that represents relevant characteristics extracted from the input sequence and a well‐parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy‐agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy‐aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low‐similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue‐level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue‐level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein–NA interactions for large protein families and proteomes. John Wiley & Sons, Inc. 2023-01-01 /pmc/articles/PMC9798252/ /pubmed/36519304 http://dx.doi.org/10.1002/pro.4544 Text en © 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Methods and Applications
Wu, Zhonghua
Basu, Sushmita
Wu, Xuantai
Kurgan, Lukasz
qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title_full qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title_fullStr qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title_full_unstemmed qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title_short qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
title_sort qnabpredict: quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids
topic Methods and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9798252/
https://www.ncbi.nlm.nih.gov/pubmed/36519304
http://dx.doi.org/10.1002/pro.4544
work_keys_str_mv AT wuzhonghua qnabpredictquickaccurateandtaxonomyawaresequencebasedpredictionofcontentofnucleicacidbindingaminoacids
AT basusushmita qnabpredictquickaccurateandtaxonomyawaresequencebasedpredictionofcontentofnucleicacidbindingaminoacids
AT wuxuantai qnabpredictquickaccurateandtaxonomyawaresequencebasedpredictionofcontentofnucleicacidbindingaminoacids
AT kurganlukasz qnabpredictquickaccurateandtaxonomyawaresequencebasedpredictionofcontentofnucleicacidbindingaminoacids