Cargando…

Improving the Sequence Ontology terminology for genomic variant annotation

BACKGROUND: The Genome Variant Format (GVF) uses the Sequence Ontology (SO) to enable detailed annotation of sequence variation. The annotation includes SO terms for the type of sequence alteration, the genomic features that are changed and the effect of the alteration. The SO maintains and updates...

Descripción completa

Detalles Bibliográficos
Autores principales: Cunningham, Fiona, Moore, Barry, Ruiz-Schultz, Nicole, Ritchie, Graham RS, Eilbeck, Karen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520272/
https://www.ncbi.nlm.nih.gov/pubmed/26229585
http://dx.doi.org/10.1186/s13326-015-0030-4
_version_ 1782383638972727296
author Cunningham, Fiona
Moore, Barry
Ruiz-Schultz, Nicole
Ritchie, Graham RS
Eilbeck, Karen
author_facet Cunningham, Fiona
Moore, Barry
Ruiz-Schultz, Nicole
Ritchie, Graham RS
Eilbeck, Karen
author_sort Cunningham, Fiona
collection PubMed
description BACKGROUND: The Genome Variant Format (GVF) uses the Sequence Ontology (SO) to enable detailed annotation of sequence variation. The annotation includes SO terms for the type of sequence alteration, the genomic features that are changed and the effect of the alteration. The SO maintains and updates the specification and provides the underlying ontologicial structure. METHODS: A requirements analysis was undertaken to gather terms missing in the SO release at the time, but needed to adequately describe the effects of sequence alteration on a set of variant genomic annotations. We have extended and remodeled the SO to include and define all terms that describe the effect of variation upon reference genomic features in the Ensembl variation databases. RESULTS: The new terminology was used to annotate the human reference genome with a set of variants from both COSMIC and dbSNP. A GVF file containing 170,853 sequence alterations was generated using the SO terminology to annotate the kinds of alteration, the effect of the alteration and the reference feature changed. There are four kinds of alteration and 24 kinds of effect seen in this dataset. (Ensembl Variation annotates 34 different SO consequence terms: http://www.ensembl.org/info/docs/variation/predicted_data.html). CONCLUSIONS: We explain the updates to the Sequence Ontology to describe the effect of variation on existing reference features. We have provided a set of annotations using this terminology, and the well defined GVF specification. We have also provided a provisional exploration of this large annotation dataset.
format Online
Article
Text
id pubmed-4520272
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45202722015-07-31 Improving the Sequence Ontology terminology for genomic variant annotation Cunningham, Fiona Moore, Barry Ruiz-Schultz, Nicole Ritchie, Graham RS Eilbeck, Karen J Biomed Semantics Short Report BACKGROUND: The Genome Variant Format (GVF) uses the Sequence Ontology (SO) to enable detailed annotation of sequence variation. The annotation includes SO terms for the type of sequence alteration, the genomic features that are changed and the effect of the alteration. The SO maintains and updates the specification and provides the underlying ontologicial structure. METHODS: A requirements analysis was undertaken to gather terms missing in the SO release at the time, but needed to adequately describe the effects of sequence alteration on a set of variant genomic annotations. We have extended and remodeled the SO to include and define all terms that describe the effect of variation upon reference genomic features in the Ensembl variation databases. RESULTS: The new terminology was used to annotate the human reference genome with a set of variants from both COSMIC and dbSNP. A GVF file containing 170,853 sequence alterations was generated using the SO terminology to annotate the kinds of alteration, the effect of the alteration and the reference feature changed. There are four kinds of alteration and 24 kinds of effect seen in this dataset. (Ensembl Variation annotates 34 different SO consequence terms: http://www.ensembl.org/info/docs/variation/predicted_data.html). CONCLUSIONS: We explain the updates to the Sequence Ontology to describe the effect of variation on existing reference features. We have provided a set of annotations using this terminology, and the well defined GVF specification. We have also provided a provisional exploration of this large annotation dataset. BioMed Central 2015-07-31 /pmc/articles/PMC4520272/ /pubmed/26229585 http://dx.doi.org/10.1186/s13326-015-0030-4 Text en © Cunningham et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Short Report
Cunningham, Fiona
Moore, Barry
Ruiz-Schultz, Nicole
Ritchie, Graham RS
Eilbeck, Karen
Improving the Sequence Ontology terminology for genomic variant annotation
title Improving the Sequence Ontology terminology for genomic variant annotation
title_full Improving the Sequence Ontology terminology for genomic variant annotation
title_fullStr Improving the Sequence Ontology terminology for genomic variant annotation
title_full_unstemmed Improving the Sequence Ontology terminology for genomic variant annotation
title_short Improving the Sequence Ontology terminology for genomic variant annotation
title_sort improving the sequence ontology terminology for genomic variant annotation
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4520272/
https://www.ncbi.nlm.nih.gov/pubmed/26229585
http://dx.doi.org/10.1186/s13326-015-0030-4
work_keys_str_mv AT cunninghamfiona improvingthesequenceontologyterminologyforgenomicvariantannotation
AT moorebarry improvingthesequenceontologyterminologyforgenomicvariantannotation
AT ruizschultznicole improvingthesequenceontologyterminologyforgenomicvariantannotation
AT ritchiegrahamrs improvingthesequenceontologyterminologyforgenomicvariantannotation
AT eilbeckkaren improvingthesequenceontologyterminologyforgenomicvariantannotation