Cargando…

Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models

BACKGROUND: Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a prot...

Descripción completa

Detalles Bibliográficos
Autores principales: Caragea, Cornelia, Caragea, Doina, Silvescu, Adrian, Honavar, Vasant
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966293/
https://www.ncbi.nlm.nih.gov/pubmed/21034431
http://dx.doi.org/10.1186/1471-2105-11-S8-S6
_version_ 1782189567348047872
author Caragea, Cornelia
Caragea, Doina
Silvescu, Adrian
Honavar, Vasant
author_facet Caragea, Cornelia
Caragea, Doina
Silvescu, Adrian
Honavar, Vasant
author_sort Caragea, Cornelia
collection PubMed
description BACKGROUND: Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing semi-supervised methods for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data. RESULTS: In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data). CONCLUSIONS: The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.
format Text
id pubmed-2966293
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29662932010-10-30 Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models Caragea, Cornelia Caragea, Doina Silvescu, Adrian Honavar, Vasant BMC Bioinformatics Research BACKGROUND: Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing semi-supervised methods for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data. RESULTS: In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting unlabeled data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data). CONCLUSIONS: The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs. BioMed Central 2010-10-26 /pmc/articles/PMC2966293/ /pubmed/21034431 http://dx.doi.org/10.1186/1471-2105-11-S8-S6 Text en Copyright ©2010 Caragea et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Caragea, Cornelia
Caragea, Doina
Silvescu, Adrian
Honavar, Vasant
Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title_full Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title_fullStr Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title_full_unstemmed Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title_short Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
title_sort semi-supervised prediction of protein subcellular localization using abstraction augmented markov models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966293/
https://www.ncbi.nlm.nih.gov/pubmed/21034431
http://dx.doi.org/10.1186/1471-2105-11-S8-S6
work_keys_str_mv AT carageacornelia semisupervisedpredictionofproteinsubcellularlocalizationusingabstractionaugmentedmarkovmodels
AT carageadoina semisupervisedpredictionofproteinsubcellularlocalizationusingabstractionaugmentedmarkovmodels
AT silvescuadrian semisupervisedpredictionofproteinsubcellularlocalizationusingabstractionaugmentedmarkovmodels
AT honavarvasant semisupervisedpredictionofproteinsubcellularlocalizationusingabstractionaugmentedmarkovmodels