Cargando…

Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach

BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organism...

Descripción completa

Detalles Bibliográficos
Autores principales:	Anwar, Firoz, Baker, Syed Murtuza, Jabid, Taskeed, Mehedi Hasan, Md, Shoyaib, Mohammad, Khan, Haseena, Walshe, Ray
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2575220/ https://www.ncbi.nlm.nih.gov/pubmed/18834544 http://dx.doi.org/10.1186/1471-2105-9-414

_version_	1782160309803286528
author	Anwar, Firoz Baker, Syed Murtuza Jabid, Taskeed Mehedi Hasan, Md Shoyaib, Mohammad Khan, Haseena Walshe, Ray
author_facet	Anwar, Firoz Baker, Syed Murtuza Jabid, Taskeed Mehedi Hasan, Md Shoyaib, Mohammad Khan, Haseena Walshe, Ray
author_sort	Anwar, Firoz
collection	PubMed
description	BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. RESULTS: In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. CONCLUSION: The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes.
format	Text
id	pubmed-2575220
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25752202008-10-29 Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach Anwar, Firoz Baker, Syed Murtuza Jabid, Taskeed Mehedi Hasan, Md Shoyaib, Mohammad Khan, Haseena Walshe, Ray BMC Bioinformatics Research Article BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. RESULTS: In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. CONCLUSION: The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes. BioMed Central 2008-10-04 /pmc/articles/PMC2575220/ /pubmed/18834544 http://dx.doi.org/10.1186/1471-2105-9-414 Text en Copyright © 2008 Anwar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Anwar, Firoz Baker, Syed Murtuza Jabid, Taskeed Mehedi Hasan, Md Shoyaib, Mohammad Khan, Haseena Walshe, Ray Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title	Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title_full	Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title_fullStr	Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title_full_unstemmed	Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title_short	Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach
title_sort	pol ii promoter prediction using characteristic 4-mer motifs: a machine learning approach
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2575220/ https://www.ncbi.nlm.nih.gov/pubmed/18834544 http://dx.doi.org/10.1186/1471-2105-9-414
work_keys_str_mv	AT anwarfiroz poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT bakersyedmurtuza poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT jabidtaskeed poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT mehedihasanmd poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT shoyaibmohammad poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT khanhaseena poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach AT walsheray poliipromoterpredictionusingcharacteristic4mermotifsamachinelearningapproach

Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach

Ejemplares similares