Cargando…

HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and t...

Descripción completa

Detalles Bibliográficos
Autores principales: Bolleman, Jerven, de Castro, Edouard, Baratin, Delphine, Gehant, Sebastien, Cuche, Beatrice A, Auchincloss, Andrea H, Coudert, Elisabeth, Hulo, Chantal, Masson, Patrick, Pedruzzi, Ivo, Rivoire, Catherine, Xenarios, Ioannis, Redaschi, Nicole, Bridge, Alan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007698/
https://www.ncbi.nlm.nih.gov/pubmed/32034905
http://dx.doi.org/10.1093/gigascience/giaa003
_version_ 1783495359394217984
author Bolleman, Jerven
de Castro, Edouard
Baratin, Delphine
Gehant, Sebastien
Cuche, Beatrice A
Auchincloss, Andrea H
Coudert, Elisabeth
Hulo, Chantal
Masson, Patrick
Pedruzzi, Ivo
Rivoire, Catherine
Xenarios, Ioannis
Redaschi, Nicole
Bridge, Alan
author_facet Bolleman, Jerven
de Castro, Edouard
Baratin, Delphine
Gehant, Sebastien
Cuche, Beatrice A
Auchincloss, Andrea H
Coudert, Elisabeth
Hulo, Chantal
Masson, Patrick
Pedruzzi, Ivo
Rivoire, Catherine
Xenarios, Ioannis
Redaschi, Nicole
Bridge, Alan
author_sort Bolleman, Jerven
collection PubMed
description BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
format Online
Article
Text
id pubmed-7007698
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70076982020-02-12 HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes Bolleman, Jerven de Castro, Edouard Baratin, Delphine Gehant, Sebastien Cuche, Beatrice A Auchincloss, Andrea H Coudert, Elisabeth Hulo, Chantal Masson, Patrick Pedruzzi, Ivo Rivoire, Catherine Xenarios, Ioannis Redaschi, Nicole Bridge, Alan Gigascience Research BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org. Oxford University Press 2020-02-08 /pmc/articles/PMC7007698/ /pubmed/32034905 http://dx.doi.org/10.1093/gigascience/giaa003 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Bolleman, Jerven
de Castro, Edouard
Baratin, Delphine
Gehant, Sebastien
Cuche, Beatrice A
Auchincloss, Andrea H
Coudert, Elisabeth
Hulo, Chantal
Masson, Patrick
Pedruzzi, Ivo
Rivoire, Catherine
Xenarios, Ioannis
Redaschi, Nicole
Bridge, Alan
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title_full HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title_fullStr HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title_full_unstemmed HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title_short HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
title_sort hamap as sparql rules—a portable annotation pipeline for genomes and proteomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007698/
https://www.ncbi.nlm.nih.gov/pubmed/32034905
http://dx.doi.org/10.1093/gigascience/giaa003
work_keys_str_mv AT bollemanjerven hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT decastroedouard hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT baratindelphine hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT gehantsebastien hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT cuchebeatricea hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT auchinclossandreah hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT coudertelisabeth hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT hulochantal hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT massonpatrick hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT pedruzziivo hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT rivoirecatherine hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT xenariosioannis hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT redaschinicole hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes
AT bridgealan hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes