Cargando…

Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier

Abstract. Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to com...

Descripción completa

Detalles Bibliográficos
Autores principales: Endara, Lorena, Thessen, Anne E, Cole, Heather A, Walls, Ramona, Gkoutos, Georgios, Cao, Yujie, Chong, Steven S., Cui, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6281706/
https://www.ncbi.nlm.nih.gov/pubmed/30532623
http://dx.doi.org/10.3897/BDJ.6.e29232
_version_ 1783378868356251648
author Endara, Lorena
Thessen, Anne E
Cole, Heather A
Walls, Ramona
Gkoutos, Georgios
Cao, Yujie
Chong, Steven S.
Cui, Hong
author_facet Endara, Lorena
Thessen, Anne E
Cole, Heather A
Walls, Ramona
Gkoutos, Georgios
Cao, Yujie
Chong, Steven S.
Cui, Hong
author_sort Endara, Lorena
collection PubMed
description Abstract. Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using “broader synonym” or “not recommended” annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.
format Online
Article
Text
id pubmed-6281706
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-62817062018-12-07 Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier Endara, Lorena Thessen, Anne E Cole, Heather A Walls, Ramona Gkoutos, Georgios Cao, Yujie Chong, Steven S. Cui, Hong Biodivers Data J Research Article Abstract. Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using “broader synonym” or “not recommended” annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage. Pensoft Publishers 2018-11-28 /pmc/articles/PMC6281706/ /pubmed/30532623 http://dx.doi.org/10.3897/BDJ.6.e29232 Text en Lorena Endara, Anne Thessen, Heather Cole, Ramona Walls, Georgios Gkoutos, Yujie Cao, Steven Chong, Hong Cui http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Endara, Lorena
Thessen, Anne E
Cole, Heather A
Walls, Ramona
Gkoutos, Georgios
Cao, Yujie
Chong, Steven S.
Cui, Hong
Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title_full Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title_fullStr Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title_full_unstemmed Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title_short Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
title_sort modifier ontologies for frequency, certainty, degree, and coverage phenotype modifier
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6281706/
https://www.ncbi.nlm.nih.gov/pubmed/30532623
http://dx.doi.org/10.3897/BDJ.6.e29232
work_keys_str_mv AT endaralorena modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT thessenannee modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT coleheathera modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT wallsramona modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT gkoutosgeorgios modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT caoyujie modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT chongstevens modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier
AT cuihong modifierontologiesforfrequencycertaintydegreeandcoveragephenotypemodifier