Cargando…

A model to predict the function of hypothetical proteins through a nine-point classification scoring schema

BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Ijaq, Johny, Malik, Girik, Kumar, Anuj, Das, Partha Sarathi, Meena, Narendra, Bethi, Neeraja, Sundararajan, Vijayaraghava Seshadri, Suravajhala, Prashanth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325861/
https://www.ncbi.nlm.nih.gov/pubmed/30621574
http://dx.doi.org/10.1186/s12859-018-2554-y
_version_ 1783386205995401216
author Ijaq, Johny
Malik, Girik
Kumar, Anuj
Das, Partha Sarathi
Meena, Narendra
Bethi, Neeraja
Sundararajan, Vijayaraghava Seshadri
Suravajhala, Prashanth
author_facet Ijaq, Johny
Malik, Girik
Kumar, Anuj
Das, Partha Sarathi
Meena, Narendra
Bethi, Neeraja
Sundararajan, Vijayaraghava Seshadri
Suravajhala, Prashanth
author_sort Ijaq, Johny
collection PubMed
description BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS: In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION: With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2554-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6325861
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63258612019-01-11 A model to predict the function of hypothetical proteins through a nine-point classification scoring schema Ijaq, Johny Malik, Girik Kumar, Anuj Das, Partha Sarathi Meena, Narendra Bethi, Neeraja Sundararajan, Vijayaraghava Seshadri Suravajhala, Prashanth BMC Bioinformatics Methodology Article BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS: In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION: With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2554-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-08 /pmc/articles/PMC6325861/ /pubmed/30621574 http://dx.doi.org/10.1186/s12859-018-2554-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ijaq, Johny
Malik, Girik
Kumar, Anuj
Das, Partha Sarathi
Meena, Narendra
Bethi, Neeraja
Sundararajan, Vijayaraghava Seshadri
Suravajhala, Prashanth
A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_full A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_fullStr A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_full_unstemmed A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_short A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
title_sort model to predict the function of hypothetical proteins through a nine-point classification scoring schema
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325861/
https://www.ncbi.nlm.nih.gov/pubmed/30621574
http://dx.doi.org/10.1186/s12859-018-2554-y
work_keys_str_mv AT ijaqjohny amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT malikgirik amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT kumaranuj amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT dasparthasarathi amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT meenanarendra amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT bethineeraja amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT sundararajanvijayaraghavaseshadri amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT suravajhalaprashanth amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT ijaqjohny modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT malikgirik modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT kumaranuj modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT dasparthasarathi modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT meenanarendra modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT bethineeraja modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT sundararajanvijayaraghavaseshadri modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema
AT suravajhalaprashanth modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema