Cargando…

BayesMotif: de novo protein sorting motif discovery from impure datasets

BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usua...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Jianjun, Zhang, Fan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009540/
https://www.ncbi.nlm.nih.gov/pubmed/20122242
http://dx.doi.org/10.1186/1471-2105-11-S1-S66
_version_ 1782194702673510400
author Hu, Jianjun
Zhang, Fan
author_facet Hu, Jianjun
Zhang, Fan
author_sort Hu, Jianjun
collection PubMed
description BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. METHODS: We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. RESULTS: Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. CONCLUSION: We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
format Text
id pubmed-3009540
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30095402010-12-23 BayesMotif: de novo protein sorting motif discovery from impure datasets Hu, Jianjun Zhang, Fan BMC Bioinformatics Research BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. METHODS: We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. RESULTS: Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. CONCLUSION: We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model. BioMed Central 2010-01-18 /pmc/articles/PMC3009540/ /pubmed/20122242 http://dx.doi.org/10.1186/1471-2105-11-S1-S66 Text en Copyright ©2010 Hu and Zhang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hu, Jianjun
Zhang, Fan
BayesMotif: de novo protein sorting motif discovery from impure datasets
title BayesMotif: de novo protein sorting motif discovery from impure datasets
title_full BayesMotif: de novo protein sorting motif discovery from impure datasets
title_fullStr BayesMotif: de novo protein sorting motif discovery from impure datasets
title_full_unstemmed BayesMotif: de novo protein sorting motif discovery from impure datasets
title_short BayesMotif: de novo protein sorting motif discovery from impure datasets
title_sort bayesmotif: de novo protein sorting motif discovery from impure datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009540/
https://www.ncbi.nlm.nih.gov/pubmed/20122242
http://dx.doi.org/10.1186/1471-2105-11-S1-S66
work_keys_str_mv AT hujianjun bayesmotifdenovoproteinsortingmotifdiscoveryfromimpuredatasets
AT zhangfan bayesmotifdenovoproteinsortingmotifdiscoveryfromimpuredatasets