Cargando…
BayesMotif: de novo protein sorting motif discovery from impure datasets
BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usua...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009540/ https://www.ncbi.nlm.nih.gov/pubmed/20122242 http://dx.doi.org/10.1186/1471-2105-11-S1-S66 |
_version_ | 1782194702673510400 |
---|---|
author | Hu, Jianjun Zhang, Fan |
author_facet | Hu, Jianjun Zhang, Fan |
author_sort | Hu, Jianjun |
collection | PubMed |
description | BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. METHODS: We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. RESULTS: Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. CONCLUSION: We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model. |
format | Text |
id | pubmed-3009540 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30095402010-12-23 BayesMotif: de novo protein sorting motif discovery from impure datasets Hu, Jianjun Zhang, Fan BMC Bioinformatics Research BACKGROUND: Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. METHODS: We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. RESULTS: Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. CONCLUSION: We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model. BioMed Central 2010-01-18 /pmc/articles/PMC3009540/ /pubmed/20122242 http://dx.doi.org/10.1186/1471-2105-11-S1-S66 Text en Copyright ©2010 Hu and Zhang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Hu, Jianjun Zhang, Fan BayesMotif: de novo protein sorting motif discovery from impure datasets |
title | BayesMotif: de novo protein sorting motif discovery from impure datasets |
title_full | BayesMotif: de novo protein sorting motif discovery from impure datasets |
title_fullStr | BayesMotif: de novo protein sorting motif discovery from impure datasets |
title_full_unstemmed | BayesMotif: de novo protein sorting motif discovery from impure datasets |
title_short | BayesMotif: de novo protein sorting motif discovery from impure datasets |
title_sort | bayesmotif: de novo protein sorting motif discovery from impure datasets |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009540/ https://www.ncbi.nlm.nih.gov/pubmed/20122242 http://dx.doi.org/10.1186/1471-2105-11-S1-S66 |
work_keys_str_mv | AT hujianjun bayesmotifdenovoproteinsortingmotifdiscoveryfromimpuredatasets AT zhangfan bayesmotifdenovoproteinsortingmotifdiscoveryfromimpuredatasets |