Cargando…

Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing project...

Descripción completa

Detalles Bibliográficos
Autores principales:	Iqbal, Muhammad Javed, Faye, Ibrahima, Samir, Brahim Belhaouari, Md Said, Abas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4089199/ https://www.ncbi.nlm.nih.gov/pubmed/25045727 http://dx.doi.org/10.1155/2014/173869

_version_	1782325083903098880
author	Iqbal, Muhammad Javed Faye, Ibrahima Samir, Brahim Belhaouari Md Said, Abas
author_facet	Iqbal, Muhammad Javed Faye, Ibrahima Samir, Brahim Belhaouari Md Said, Abas
author_sort	Iqbal, Muhammad Javed
collection	PubMed
description	Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
format	Online Article Text
id	pubmed-4089199
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-40891992014-07-20 Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics Iqbal, Muhammad Javed Faye, Ibrahima Samir, Brahim Belhaouari Md Said, Abas ScientificWorldJournal Research Article Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. Hindawi Publishing Corporation 2014 2014-06-19 /pmc/articles/PMC4089199/ /pubmed/25045727 http://dx.doi.org/10.1155/2014/173869 Text en Copyright © 2014 Muhammad Javed Iqbal et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Iqbal, Muhammad Javed Faye, Ibrahima Samir, Brahim Belhaouari Md Said, Abas Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_full	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_fullStr	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_full_unstemmed	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_short	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_sort	efficient feature selection and classification of protein sequence data in bioinformatics
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4089199/ https://www.ncbi.nlm.nih.gov/pubmed/25045727 http://dx.doi.org/10.1155/2014/173869
work_keys_str_mv	AT iqbalmuhammadjaved efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT fayeibrahima efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT samirbrahimbelhaouari efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT mdsaidabas efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics

Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Ejemplares similares