Cargando…

Annotating the biomedical literature for the human variome

This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Verspoor, Karin, Jimeno Yepes, Antonio, Cavedon, Lawrence, McIntosh, Tara, Herten-Crabb, Asha, Thomas, Zoë, Plazzer, John-Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676157/
https://www.ncbi.nlm.nih.gov/pubmed/23584833
http://dx.doi.org/10.1093/database/bat019
_version_ 1782272612566564864
author Verspoor, Karin
Jimeno Yepes, Antonio
Cavedon, Lawrence
McIntosh, Tara
Herten-Crabb, Asha
Thomas, Zoë
Plazzer, John-Paul
author_facet Verspoor, Karin
Jimeno Yepes, Antonio
Cavedon, Lawrence
McIntosh, Tara
Herten-Crabb, Asha
Thomas, Zoë
Plazzer, John-Paul
author_sort Verspoor, Karin
collection PubMed
description This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome.
format Online
Article
Text
id pubmed-3676157
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36761572013-06-07 Annotating the biomedical literature for the human variome Verspoor, Karin Jimeno Yepes, Antonio Cavedon, Lawrence McIntosh, Tara Herten-Crabb, Asha Thomas, Zoë Plazzer, John-Paul Database (Oxford) Original Article This article introduces the Variome Annotation Schema, a schema that aims to capture the core concepts and relations relevant to cataloguing and interpreting human genetic variation and its relationship to disease, as described in the published literature. The schema was inspired by the needs of the database curators of the International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database, but is intended to have application to genetic variation information in a range of diseases. The schema has been applied to a small corpus of full text journal publications on the subject of inherited colorectal cancer. We show that the inter-annotator agreement on annotation of this corpus ranges from 0.78 to 0.95 F-score across different entity types when exact matching is measured, and improves to a minimum F-score of 0.87 when boundary matching is relaxed. Relations show more variability in agreement, but several are reliable, with the highest, cohort-has-size, reaching 0.90 F-score. We also explore the relevance of the schema to the InSiGHT database curation process. The schema and the corpus represent an important new resource for the development of text mining solutions that address relationships among patient cohorts, disease and genetic variation, and therefore, we also discuss the role text mining might play in the curation of information related to the human variome. The corpus is available at http://opennicta.com/home/health/variome. Oxford University Press 2013-04-12 /pmc/articles/PMC3676157/ /pubmed/23584833 http://dx.doi.org/10.1093/database/bat019 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Verspoor, Karin
Jimeno Yepes, Antonio
Cavedon, Lawrence
McIntosh, Tara
Herten-Crabb, Asha
Thomas, Zoë
Plazzer, John-Paul
Annotating the biomedical literature for the human variome
title Annotating the biomedical literature for the human variome
title_full Annotating the biomedical literature for the human variome
title_fullStr Annotating the biomedical literature for the human variome
title_full_unstemmed Annotating the biomedical literature for the human variome
title_short Annotating the biomedical literature for the human variome
title_sort annotating the biomedical literature for the human variome
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676157/
https://www.ncbi.nlm.nih.gov/pubmed/23584833
http://dx.doi.org/10.1093/database/bat019
work_keys_str_mv AT verspoorkarin annotatingthebiomedicalliteratureforthehumanvariome
AT jimenoyepesantonio annotatingthebiomedicalliteratureforthehumanvariome
AT cavedonlawrence annotatingthebiomedicalliteratureforthehumanvariome
AT mcintoshtara annotatingthebiomedicalliteratureforthehumanvariome
AT hertencrabbasha annotatingthebiomedicalliteratureforthehumanvariome
AT thomaszoe annotatingthebiomedicalliteratureforthehumanvariome
AT plazzerjohnpaul annotatingthebiomedicalliteratureforthehumanvariome