Cargando…

APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins

RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharan, Malvika, Förstner, Konrad U., Eulalio, Ana, Vogel, Jörg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499795/
https://www.ncbi.nlm.nih.gov/pubmed/28334975
http://dx.doi.org/10.1093/nar/gkx137
_version_ 1783248530572312576
author Sharan, Malvika
Förstner, Konrad U.
Eulalio, Ana
Vogel, Jörg
author_facet Sharan, Malvika
Förstner, Konrad U.
Eulalio, Ana
Vogel, Jörg
author_sort Sharan, Malvika
collection PubMed
description RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.
format Online
Article
Text
id pubmed-5499795
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54997952017-07-12 APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins Sharan, Malvika Förstner, Konrad U. Eulalio, Ana Vogel, Jörg Nucleic Acids Res Methods Online RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using position-specific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot. Oxford University Press 2017-06-20 2017-03-02 /pmc/articles/PMC5499795/ /pubmed/28334975 http://dx.doi.org/10.1093/nar/gkx137 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Sharan, Malvika
Förstner, Konrad U.
Eulalio, Ana
Vogel, Jörg
APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title_full APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title_fullStr APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title_full_unstemmed APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title_short APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins
title_sort apricot: an integrated computational pipeline for the sequence-based identification and characterization of rna-binding proteins
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499795/
https://www.ncbi.nlm.nih.gov/pubmed/28334975
http://dx.doi.org/10.1093/nar/gkx137
work_keys_str_mv AT sharanmalvika apricotanintegratedcomputationalpipelineforthesequencebasedidentificationandcharacterizationofrnabindingproteins
AT forstnerkonradu apricotanintegratedcomputationalpipelineforthesequencebasedidentificationandcharacterizationofrnabindingproteins
AT eulalioana apricotanintegratedcomputationalpipelineforthesequencebasedidentificationandcharacterizationofrnabindingproteins
AT vogeljorg apricotanintegratedcomputationalpipelineforthesequencebasedidentificationandcharacterizationofrnabindingproteins