Cargando…

Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models

Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and func...

Descripción completa

Detalles Bibliográficos
Autores principales: Hosseinzadeh, Faezeh, Ebrahimi, Mansour, Goliaei, Bahram, Shamabadi, Narges
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400626/
https://www.ncbi.nlm.nih.gov/pubmed/22829872
http://dx.doi.org/10.1371/journal.pone.0040017
_version_ 1782238514254970880
author Hosseinzadeh, Faezeh
Ebrahimi, Mansour
Goliaei, Bahram
Shamabadi, Narges
author_facet Hosseinzadeh, Faezeh
Ebrahimi, Mansour
Goliaei, Bahram
Shamabadi, Narges
author_sort Hosseinzadeh, Faezeh
collection PubMed
description Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
format Online
Article
Text
id pubmed-3400626
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34006262012-07-24 Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models Hosseinzadeh, Faezeh Ebrahimi, Mansour Goliaei, Bahram Shamabadi, Narges PLoS One Research Article Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported. Public Library of Science 2012-07-19 /pmc/articles/PMC3400626/ /pubmed/22829872 http://dx.doi.org/10.1371/journal.pone.0040017 Text en Hosseinzadeh et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Hosseinzadeh, Faezeh
Ebrahimi, Mansour
Goliaei, Bahram
Shamabadi, Narges
Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title_full Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title_fullStr Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title_full_unstemmed Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title_short Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins by Bioinformatics Models
title_sort classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400626/
https://www.ncbi.nlm.nih.gov/pubmed/22829872
http://dx.doi.org/10.1371/journal.pone.0040017
work_keys_str_mv AT hosseinzadehfaezeh classificationoflungcancertumorsbasedonstructuralandphysicochemicalpropertiesofproteinsbybioinformaticsmodels
AT ebrahimimansour classificationoflungcancertumorsbasedonstructuralandphysicochemicalpropertiesofproteinsbybioinformaticsmodels
AT goliaeibahram classificationoflungcancertumorsbasedonstructuralandphysicochemicalpropertiesofproteinsbybioinformaticsmodels
AT shamabadinarges classificationoflungcancertumorsbasedonstructuralandphysicochemicalpropertiesofproteinsbybioinformaticsmodels