Cargando…
A machine learning strategy to identify candidate binding sites in human protein-coding sequence
BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It w...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592515/ https://www.ncbi.nlm.nih.gov/pubmed/17002805 http://dx.doi.org/10.1186/1471-2105-7-419 |
_version_ | 1782130412012699648 |
---|---|
author | Down, Thomas Leong, Bernard Hubbard, Tim JP |
author_facet | Down, Thomas Leong, Bernard Hubbard, Tim JP |
author_sort | Down, Thomas |
collection | PubMed |
description | BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. RESULTS: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. CONCLUSION: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements. |
format | Text |
id | pubmed-1592515 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-15925152006-10-12 A machine learning strategy to identify candidate binding sites in human protein-coding sequence Down, Thomas Leong, Bernard Hubbard, Tim JP BMC Bioinformatics Research Article BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. RESULTS: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. CONCLUSION: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements. BioMed Central 2006-09-26 /pmc/articles/PMC1592515/ /pubmed/17002805 http://dx.doi.org/10.1186/1471-2105-7-419 Text en Copyright © 2006 Down et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Down, Thomas Leong, Bernard Hubbard, Tim JP A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title | A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title_full | A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title_fullStr | A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title_full_unstemmed | A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title_short | A machine learning strategy to identify candidate binding sites in human protein-coding sequence |
title_sort | machine learning strategy to identify candidate binding sites in human protein-coding sequence |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592515/ https://www.ncbi.nlm.nih.gov/pubmed/17002805 http://dx.doi.org/10.1186/1471-2105-7-419 |
work_keys_str_mv | AT downthomas amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence AT leongbernard amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence AT hubbardtimjp amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence AT downthomas machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence AT leongbernard machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence AT hubbardtimjp machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence |