Cargando…
Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339748/ https://www.ncbi.nlm.nih.gov/pubmed/25638391 http://dx.doi.org/10.1186/s12859-015-0470-y |
_version_ | 1782358912831324160 |
---|---|
author | Comoglio, Federico Sievers, Cem Paro, Renato |
author_facet | Comoglio, Federico Sievers, Cem Paro, Renato |
author_sort | Comoglio, Federico |
collection | PubMed |
description | BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments. RESULTS: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy. CONCLUSIONS: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0470-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4339748 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43397482015-02-26 Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data Comoglio, Federico Sievers, Cem Paro, Renato BMC Bioinformatics Software BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments. RESULTS: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy. CONCLUSIONS: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0470-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-01 /pmc/articles/PMC4339748/ /pubmed/25638391 http://dx.doi.org/10.1186/s12859-015-0470-y Text en © Comoglio et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Comoglio, Federico Sievers, Cem Paro, Renato Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title | Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title_full | Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title_fullStr | Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title_full_unstemmed | Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title_short | Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data |
title_sort | sensitive and highly resolved identification of rna-protein interaction sites in par-clip data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339748/ https://www.ncbi.nlm.nih.gov/pubmed/25638391 http://dx.doi.org/10.1186/s12859-015-0470-y |
work_keys_str_mv | AT comogliofederico sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata AT sieverscem sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata AT parorenato sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata |