Cargando…

NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence

N‐acetylglucosamine (NAG) belongs to the eight essential saccharides that are required to maintain the optimal health and precise functioning of systems ranging from bacteria to human. In the present study, we have developed a method, NAGbinder, which predicts the NAG‐interacting residues in a prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Patiyal, Sumeet, Agrawal, Piyush, Kumar, Vinod, Dhall, Anjali, Kumar, Rajesh, Mishra, Gaurav, Raghava, Gajendra P.S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933864/
https://www.ncbi.nlm.nih.gov/pubmed/31654438
http://dx.doi.org/10.1002/pro.3761
_version_ 1783483292532604928
author Patiyal, Sumeet
Agrawal, Piyush
Kumar, Vinod
Dhall, Anjali
Kumar, Rajesh
Mishra, Gaurav
Raghava, Gajendra P.S.
author_facet Patiyal, Sumeet
Agrawal, Piyush
Kumar, Vinod
Dhall, Anjali
Kumar, Rajesh
Mishra, Gaurav
Raghava, Gajendra P.S.
author_sort Patiyal, Sumeet
collection PubMed
description N‐acetylglucosamine (NAG) belongs to the eight essential saccharides that are required to maintain the optimal health and precise functioning of systems ranging from bacteria to human. In the present study, we have developed a method, NAGbinder, which predicts the NAG‐interacting residues in a protein from its primary sequence information. We extracted 231 NAG‐interacting nonredundant protein chains from Protein Data Bank, where no two sequences share more than 40% sequence identity. All prediction models were trained, validated, and evaluated on these 231 protein chains. At first, prediction models were developed on balanced data consisting of 1,335 NAG‐interacting and noninteracting residues, using various window size. The model developed by implementing Random Forest using binary profiles as the main principle for identifying NAG‐interacting residue with window size 9, performed best among other models. It achieved highest Matthews Correlation Coefficient (MCC) of 0.31 and 0.25, and Area Under Receiver Operating Curve (AUROC) of 0.73 and 0.70 on training and validation data set, respectively. We also developed prediction models on realistic data set (1,335 NAG‐interacting and 47,198 noninteracting residues) using the same principle, where the model achieved MCC of 0.26 and 0.27, and AUROC of 0.70 and 0.71, on training and validation data set, respectively. The success of our method can be appraised by the fact that, if a sequence of 1,000 amino acids is analyzed with our approach, 10 residues will be predicted as NAG‐interacting, out of which five are correct. Best models were incorporated in the standalone version and in the webserver available at https://webs.iiitd.edu.in/raghava/nagbinder/
format Online
Article
Text
id pubmed-6933864
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-69338642019-12-30 NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence Patiyal, Sumeet Agrawal, Piyush Kumar, Vinod Dhall, Anjali Kumar, Rajesh Mishra, Gaurav Raghava, Gajendra P.S. Protein Sci Tools for Protein Science N‐acetylglucosamine (NAG) belongs to the eight essential saccharides that are required to maintain the optimal health and precise functioning of systems ranging from bacteria to human. In the present study, we have developed a method, NAGbinder, which predicts the NAG‐interacting residues in a protein from its primary sequence information. We extracted 231 NAG‐interacting nonredundant protein chains from Protein Data Bank, where no two sequences share more than 40% sequence identity. All prediction models were trained, validated, and evaluated on these 231 protein chains. At first, prediction models were developed on balanced data consisting of 1,335 NAG‐interacting and noninteracting residues, using various window size. The model developed by implementing Random Forest using binary profiles as the main principle for identifying NAG‐interacting residue with window size 9, performed best among other models. It achieved highest Matthews Correlation Coefficient (MCC) of 0.31 and 0.25, and Area Under Receiver Operating Curve (AUROC) of 0.73 and 0.70 on training and validation data set, respectively. We also developed prediction models on realistic data set (1,335 NAG‐interacting and 47,198 noninteracting residues) using the same principle, where the model achieved MCC of 0.26 and 0.27, and AUROC of 0.70 and 0.71, on training and validation data set, respectively. The success of our method can be appraised by the fact that, if a sequence of 1,000 amino acids is analyzed with our approach, 10 residues will be predicted as NAG‐interacting, out of which five are correct. Best models were incorporated in the standalone version and in the webserver available at https://webs.iiitd.edu.in/raghava/nagbinder/ John Wiley & Sons, Inc. 2019-11-07 2020-01 /pmc/articles/PMC6933864/ /pubmed/31654438 http://dx.doi.org/10.1002/pro.3761 Text en © 2019 The Authors. Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Tools for Protein Science
Patiyal, Sumeet
Agrawal, Piyush
Kumar, Vinod
Dhall, Anjali
Kumar, Rajesh
Mishra, Gaurav
Raghava, Gajendra P.S.
NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title_full NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title_fullStr NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title_full_unstemmed NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title_short NAGbinder: An approach for identifying N‐acetylglucosamine interacting residues of a protein from its primary sequence
title_sort nagbinder: an approach for identifying n‐acetylglucosamine interacting residues of a protein from its primary sequence
topic Tools for Protein Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933864/
https://www.ncbi.nlm.nih.gov/pubmed/31654438
http://dx.doi.org/10.1002/pro.3761
work_keys_str_mv AT patiyalsumeet nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT agrawalpiyush nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT kumarvinod nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT dhallanjali nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT kumarrajesh nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT mishragaurav nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence
AT raghavagajendraps nagbinderanapproachforidentifyingnacetylglucosamineinteractingresiduesofaproteinfromitsprimarysequence