Cargando…

Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Tao, Gu, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3923751/
https://www.ncbi.nlm.nih.gov/pubmed/24551063
http://dx.doi.org/10.1371/journal.pone.0087670
_version_ 1782303649783873536
author Song, Tao
Gu, Hong
author_facet Song, Tao
Gu, Hong
author_sort Song, Tao
collection PubMed
description Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
format Online
Article
Text
id pubmed-3923751
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39237512014-02-18 Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling Song, Tao Gu, Hong PLoS One Research Article Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes. Public Library of Science 2014-02-13 /pmc/articles/PMC3923751/ /pubmed/24551063 http://dx.doi.org/10.1371/journal.pone.0087670 Text en © 2014 Song, Gu http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Song, Tao
Gu, Hong
Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title_full Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title_fullStr Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title_full_unstemmed Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title_short Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling
title_sort discriminative motif discovery via simulated evolution and random under-sampling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3923751/
https://www.ncbi.nlm.nih.gov/pubmed/24551063
http://dx.doi.org/10.1371/journal.pone.0087670
work_keys_str_mv AT songtao discriminativemotifdiscoveryviasimulatedevolutionandrandomundersampling
AT guhong discriminativemotifdiscoveryviasimulatedevolutionandrandomundersampling