Cargando…

A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences

[Image: see text] Introduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on D...

Descripción completa

Detalles Bibliográficos
Autores principales: Khodaei, Amin, Feizi-Derakhshi, Mohammad-Reza, Mozaffari-Tazehkand, Behzad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Tabriz University of Medical Sciences (TUOMS Publishing Group) 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8022238/
https://www.ncbi.nlm.nih.gov/pubmed/33842279
http://dx.doi.org/10.34172/bi.2021.16
_version_ 1783674898864930816
author Khodaei, Amin
Feizi-Derakhshi, Mohammad-Reza
Mozaffari-Tazehkand, Behzad
author_facet Khodaei, Amin
Feizi-Derakhshi, Mohammad-Reza
Mozaffari-Tazehkand, Behzad
author_sort Khodaei, Amin
collection PubMed
description [Image: see text] Introduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on DNA sequences. The transition probability of occurring two pairs of nucleotides in DNA sequences has Markovian property. This property inspires the idea of feature dimension reduction of DNA sequence for overcoming the high computational overhead of genes analysis. This idea is utilized in this research based on the Markovian property of DNA sequences. This mapping decreases feature dimensions and conserves basic properties for discrimination of cancerous and non-cancerous genes. Results: The results showed that a non-linear support vector machine (SVM) classifier with RBF and polynomial kernel functions can discriminate selected cancerous samples from non-cancerous ones. Experimental results based on the 10-fold cross-validation and accuracy metrics verified that the proposed method has low computational overhead and high accuracy. Conclusion: The proposed algorithm was successfully tested on related research case studies. In general, a combination of proposed Markovian-based feature reduction and non-linear SVM classifier can be considered as one of the best methods for discrimination of cancerous and non-cancerous genes.
format Online
Article
Text
id pubmed-8022238
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Tabriz University of Medical Sciences (TUOMS Publishing Group)
record_format MEDLINE/PubMed
spelling pubmed-80222382021-04-09 A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences Khodaei, Amin Feizi-Derakhshi, Mohammad-Reza Mozaffari-Tazehkand, Behzad Bioimpacts Original Research [Image: see text] Introduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on DNA sequences. The transition probability of occurring two pairs of nucleotides in DNA sequences has Markovian property. This property inspires the idea of feature dimension reduction of DNA sequence for overcoming the high computational overhead of genes analysis. This idea is utilized in this research based on the Markovian property of DNA sequences. This mapping decreases feature dimensions and conserves basic properties for discrimination of cancerous and non-cancerous genes. Results: The results showed that a non-linear support vector machine (SVM) classifier with RBF and polynomial kernel functions can discriminate selected cancerous samples from non-cancerous ones. Experimental results based on the 10-fold cross-validation and accuracy metrics verified that the proposed method has low computational overhead and high accuracy. Conclusion: The proposed algorithm was successfully tested on related research case studies. In general, a combination of proposed Markovian-based feature reduction and non-linear SVM classifier can be considered as one of the best methods for discrimination of cancerous and non-cancerous genes. Tabriz University of Medical Sciences (TUOMS Publishing Group) 2021 2020-03-24 /pmc/articles/PMC8022238/ /pubmed/33842279 http://dx.doi.org/10.34172/bi.2021.16 Text en © 2021 The Author(s) This work is published by BioImpacts as an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/). Non-commercial uses of the work are permitted, provided the original work is properly cited.
spellingShingle Original Research
Khodaei, Amin
Feizi-Derakhshi, Mohammad-Reza
Mozaffari-Tazehkand, Behzad
A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_full A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_fullStr A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_full_unstemmed A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_short A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_sort markov chain-based feature extraction method for classification and identification of cancerous dna sequences
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8022238/
https://www.ncbi.nlm.nih.gov/pubmed/33842279
http://dx.doi.org/10.34172/bi.2021.16
work_keys_str_mv AT khodaeiamin amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT feiziderakhshimohammadreza amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT mozaffaritazehkandbehzad amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT khodaeiamin markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT feiziderakhshimohammadreza markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT mozaffaritazehkandbehzad markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences