Cargando…

Effective Feature Selection for Classification of Promoter Sequences

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of ex...

Descripción completa

Detalles Bibliográficos
Autores principales: K., Kouser, P. G., Lavanya, Rangarajan, Lalitha, K., Acharya Kshitish
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5158321/
https://www.ncbi.nlm.nih.gov/pubmed/27978541
http://dx.doi.org/10.1371/journal.pone.0167165
_version_ 1782481583389802496
author K., Kouser
P. G., Lavanya
Rangarajan, Lalitha
K., Acharya Kshitish
author_facet K., Kouser
P. G., Lavanya
Rangarajan, Lalitha
K., Acharya Kshitish
author_sort K., Kouser
collection PubMed
description Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
format Online
Article
Text
id pubmed-5158321
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51583212016-12-21 Effective Feature Selection for Classification of Promoter Sequences K., Kouser P. G., Lavanya Rangarajan, Lalitha K., Acharya Kshitish PLoS One Research Article Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species. Public Library of Science 2016-12-15 /pmc/articles/PMC5158321/ /pubmed/27978541 http://dx.doi.org/10.1371/journal.pone.0167165 Text en © 2016 Kouser et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
K., Kouser
P. G., Lavanya
Rangarajan, Lalitha
K., Acharya Kshitish
Effective Feature Selection for Classification of Promoter Sequences
title Effective Feature Selection for Classification of Promoter Sequences
title_full Effective Feature Selection for Classification of Promoter Sequences
title_fullStr Effective Feature Selection for Classification of Promoter Sequences
title_full_unstemmed Effective Feature Selection for Classification of Promoter Sequences
title_short Effective Feature Selection for Classification of Promoter Sequences
title_sort effective feature selection for classification of promoter sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5158321/
https://www.ncbi.nlm.nih.gov/pubmed/27978541
http://dx.doi.org/10.1371/journal.pone.0167165
work_keys_str_mv AT kkouser effectivefeatureselectionforclassificationofpromotersequences
AT pglavanya effectivefeatureselectionforclassificationofpromotersequences
AT rangarajanlalitha effectivefeatureselectionforclassificationofpromotersequences
AT kacharyakshitish effectivefeatureselectionforclassificationofpromotersequences