Cargando…
A model to predict the function of hypothetical proteins through a nine-point classification scoring schema
BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher seq...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325861/ https://www.ncbi.nlm.nih.gov/pubmed/30621574 http://dx.doi.org/10.1186/s12859-018-2554-y |
_version_ | 1783386205995401216 |
---|---|
author | Ijaq, Johny Malik, Girik Kumar, Anuj Das, Partha Sarathi Meena, Narendra Bethi, Neeraja Sundararajan, Vijayaraghava Seshadri Suravajhala, Prashanth |
author_facet | Ijaq, Johny Malik, Girik Kumar, Anuj Das, Partha Sarathi Meena, Narendra Bethi, Neeraja Sundararajan, Vijayaraghava Seshadri Suravajhala, Prashanth |
author_sort | Ijaq, Johny |
collection | PubMed |
description | BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS: In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION: With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2554-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6325861 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63258612019-01-11 A model to predict the function of hypothetical proteins through a nine-point classification scoring schema Ijaq, Johny Malik, Girik Kumar, Anuj Das, Partha Sarathi Meena, Narendra Bethi, Neeraja Sundararajan, Vijayaraghava Seshadri Suravajhala, Prashanth BMC Bioinformatics Methodology Article BACKGROUND: Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs. RESULTS: In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67). CONCLUSION: With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2554-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-08 /pmc/articles/PMC6325861/ /pubmed/30621574 http://dx.doi.org/10.1186/s12859-018-2554-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Ijaq, Johny Malik, Girik Kumar, Anuj Das, Partha Sarathi Meena, Narendra Bethi, Neeraja Sundararajan, Vijayaraghava Seshadri Suravajhala, Prashanth A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title | A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title_full | A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title_fullStr | A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title_full_unstemmed | A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title_short | A model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
title_sort | model to predict the function of hypothetical proteins through a nine-point classification scoring schema |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325861/ https://www.ncbi.nlm.nih.gov/pubmed/30621574 http://dx.doi.org/10.1186/s12859-018-2554-y |
work_keys_str_mv | AT ijaqjohny amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT malikgirik amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT kumaranuj amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT dasparthasarathi amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT meenanarendra amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT bethineeraja amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT sundararajanvijayaraghavaseshadri amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT suravajhalaprashanth amodeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT ijaqjohny modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT malikgirik modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT kumaranuj modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT dasparthasarathi modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT meenanarendra modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT bethineeraja modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT sundararajanvijayaraghavaseshadri modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema AT suravajhalaprashanth modeltopredictthefunctionofhypotheticalproteinsthroughaninepointclassificationscoringschema |