Cargando…
HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and t...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007698/ https://www.ncbi.nlm.nih.gov/pubmed/32034905 http://dx.doi.org/10.1093/gigascience/giaa003 |
_version_ | 1783495359394217984 |
---|---|
author | Bolleman, Jerven de Castro, Edouard Baratin, Delphine Gehant, Sebastien Cuche, Beatrice A Auchincloss, Andrea H Coudert, Elisabeth Hulo, Chantal Masson, Patrick Pedruzzi, Ivo Rivoire, Catherine Xenarios, Ioannis Redaschi, Nicole Bridge, Alan |
author_facet | Bolleman, Jerven de Castro, Edouard Baratin, Delphine Gehant, Sebastien Cuche, Beatrice A Auchincloss, Andrea H Coudert, Elisabeth Hulo, Chantal Masson, Patrick Pedruzzi, Ivo Rivoire, Catherine Xenarios, Ioannis Redaschi, Nicole Bridge, Alan |
author_sort | Bolleman, Jerven |
collection | PubMed |
description | BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org. |
format | Online Article Text |
id | pubmed-7007698 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-70076982020-02-12 HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes Bolleman, Jerven de Castro, Edouard Baratin, Delphine Gehant, Sebastien Cuche, Beatrice A Auchincloss, Andrea H Coudert, Elisabeth Hulo, Chantal Masson, Patrick Pedruzzi, Ivo Rivoire, Catherine Xenarios, Ioannis Redaschi, Nicole Bridge, Alan Gigascience Research BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org. Oxford University Press 2020-02-08 /pmc/articles/PMC7007698/ /pubmed/32034905 http://dx.doi.org/10.1093/gigascience/giaa003 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Bolleman, Jerven de Castro, Edouard Baratin, Delphine Gehant, Sebastien Cuche, Beatrice A Auchincloss, Andrea H Coudert, Elisabeth Hulo, Chantal Masson, Patrick Pedruzzi, Ivo Rivoire, Catherine Xenarios, Ioannis Redaschi, Nicole Bridge, Alan HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title | HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title_full | HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title_fullStr | HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title_full_unstemmed | HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title_short | HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes |
title_sort | hamap as sparql rules—a portable annotation pipeline for genomes and proteomes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007698/ https://www.ncbi.nlm.nih.gov/pubmed/32034905 http://dx.doi.org/10.1093/gigascience/giaa003 |
work_keys_str_mv | AT bollemanjerven hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT decastroedouard hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT baratindelphine hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT gehantsebastien hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT cuchebeatricea hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT auchinclossandreah hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT coudertelisabeth hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT hulochantal hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT massonpatrick hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT pedruzziivo hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT rivoirecatherine hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT xenariosioannis hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT redaschinicole hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes AT bridgealan hamapassparqlrulesaportableannotationpipelineforgenomesandproteomes |