Cargando…

Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data

BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions an...

Descripción completa

Detalles Bibliográficos
Autores principales: Comoglio, Federico, Sievers, Cem, Paro, Renato
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339748/
https://www.ncbi.nlm.nih.gov/pubmed/25638391
http://dx.doi.org/10.1186/s12859-015-0470-y
_version_ 1782358912831324160
author Comoglio, Federico
Sievers, Cem
Paro, Renato
author_facet Comoglio, Federico
Sievers, Cem
Paro, Renato
author_sort Comoglio, Federico
collection PubMed
description BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments. RESULTS: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy. CONCLUSIONS: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0470-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4339748
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43397482015-02-26 Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data Comoglio, Federico Sievers, Cem Paro, Renato BMC Bioinformatics Software BACKGROUND: PAR-CLIP is a recently developed Next Generation Sequencing-based method enabling transcriptome-wide identification of interaction sites between RNA and RNA-binding proteins. The PAR-CLIP procedure induces specific base transitions that originate from sites of RNA-protein interactions and can therefore guide the identification of binding sites. However, additional sources of transitions, such as cell type-specific SNPs and sequencing errors, challenge the inference of binding sites and suitable statistical approaches are crucial to control false discovery rates. In addition, a highly resolved delineation of binding sites followed by an extensive downstream analysis is necessary for a comprehensive characterization of the protein binding preferences and the subsequent design of validation experiments. RESULTS: We present a statistical and computational framework for PAR-CLIP data analysis. We developed a sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data. Our method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction. We use published PAR-CLIP data to demonstrate the advantages of our approach, which compares favorably with alternative algorithms. Lastly, by integrating RNA-Seq data we compute conservative experimentally-based false discovery rates of our method and demonstrate the high precision of our strategy. CONCLUSIONS: Our method is implemented in the R package wavClusteR 2.0. The package is distributed under the GPL-2 license and is available from BioConductor at http://www.bioconductor.org/packages/devel/bioc/html/wavClusteR.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0470-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-01 /pmc/articles/PMC4339748/ /pubmed/25638391 http://dx.doi.org/10.1186/s12859-015-0470-y Text en © Comoglio et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Comoglio, Federico
Sievers, Cem
Paro, Renato
Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title_full Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title_fullStr Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title_full_unstemmed Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title_short Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
title_sort sensitive and highly resolved identification of rna-protein interaction sites in par-clip data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339748/
https://www.ncbi.nlm.nih.gov/pubmed/25638391
http://dx.doi.org/10.1186/s12859-015-0470-y
work_keys_str_mv AT comogliofederico sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata
AT sieverscem sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata
AT parorenato sensitiveandhighlyresolvedidentificationofrnaproteininteractionsitesinparclipdata