Cargando…

A machine learning strategy to identify candidate binding sites in human protein-coding sequence

BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It w...

Descripción completa

Detalles Bibliográficos
Autores principales: Down, Thomas, Leong, Bernard, Hubbard, Tim JP
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592515/
https://www.ncbi.nlm.nih.gov/pubmed/17002805
http://dx.doi.org/10.1186/1471-2105-7-419
_version_ 1782130412012699648
author Down, Thomas
Leong, Bernard
Hubbard, Tim JP
author_facet Down, Thomas
Leong, Bernard
Hubbard, Tim JP
author_sort Down, Thomas
collection PubMed
description BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. RESULTS: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. CONCLUSION: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.
format Text
id pubmed-1592515
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15925152006-10-12 A machine learning strategy to identify candidate binding sites in human protein-coding sequence Down, Thomas Leong, Bernard Hubbard, Tim JP BMC Bioinformatics Research Article BACKGROUND: The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. RESULTS: This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. CONCLUSION: We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements. BioMed Central 2006-09-26 /pmc/articles/PMC1592515/ /pubmed/17002805 http://dx.doi.org/10.1186/1471-2105-7-419 Text en Copyright © 2006 Down et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Down, Thomas
Leong, Bernard
Hubbard, Tim JP
A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title_full A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title_fullStr A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title_full_unstemmed A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title_short A machine learning strategy to identify candidate binding sites in human protein-coding sequence
title_sort machine learning strategy to identify candidate binding sites in human protein-coding sequence
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1592515/
https://www.ncbi.nlm.nih.gov/pubmed/17002805
http://dx.doi.org/10.1186/1471-2105-7-419
work_keys_str_mv AT downthomas amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence
AT leongbernard amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence
AT hubbardtimjp amachinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence
AT downthomas machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence
AT leongbernard machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence
AT hubbardtimjp machinelearningstrategytoidentifycandidatebindingsitesinhumanproteincodingsequence