Cargando…

Automated generation of gene summaries at the Alliance of Genome Resources

Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed...

Descripción completa

Detalles Bibliográficos
Autores principales: Kishore, Ranjana, Arnaboldi, Valerio, Van Slyke, Ceri E, Chan, Juancarlos, Nash, Robert S, Urbano, Jose M, Dolan, Mary E, Engel, Stacia R, Shimoyama, Mary, Sternberg, Paul W, Genome Resources, the Alliance of
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304461/
https://www.ncbi.nlm.nih.gov/pubmed/32559296
http://dx.doi.org/10.1093/database/baaa037
_version_ 1783548268910739456
author Kishore, Ranjana
Arnaboldi, Valerio
Van Slyke, Ceri E
Chan, Juancarlos
Nash, Robert S
Urbano, Jose M
Dolan, Mary E
Engel, Stacia R
Shimoyama, Mary
Sternberg, Paul W
Genome Resources, the Alliance of
author_facet Kishore, Ranjana
Arnaboldi, Valerio
Van Slyke, Ceri E
Chan, Juancarlos
Nash, Robert S
Urbano, Jose M
Dolan, Mary E
Engel, Stacia R
Shimoyama, Mary
Sternberg, Paul W
Genome Resources, the Alliance of
author_sort Kishore, Ranjana
collection PubMed
description Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.
format Online
Article
Text
id pubmed-7304461
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73044612020-06-24 Automated generation of gene summaries at the Alliance of Genome Resources Kishore, Ranjana Arnaboldi, Valerio Van Slyke, Ceri E Chan, Juancarlos Nash, Robert S Urbano, Jose M Dolan, Mary E Engel, Stacia R Shimoyama, Mary Sternberg, Paul W Genome Resources, the Alliance of Database (Oxford) Original Article Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages. Oxford University Press 2020-06-19 /pmc/articles/PMC7304461/ /pubmed/32559296 http://dx.doi.org/10.1093/database/baaa037 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kishore, Ranjana
Arnaboldi, Valerio
Van Slyke, Ceri E
Chan, Juancarlos
Nash, Robert S
Urbano, Jose M
Dolan, Mary E
Engel, Stacia R
Shimoyama, Mary
Sternberg, Paul W
Genome Resources, the Alliance of
Automated generation of gene summaries at the Alliance of Genome Resources
title Automated generation of gene summaries at the Alliance of Genome Resources
title_full Automated generation of gene summaries at the Alliance of Genome Resources
title_fullStr Automated generation of gene summaries at the Alliance of Genome Resources
title_full_unstemmed Automated generation of gene summaries at the Alliance of Genome Resources
title_short Automated generation of gene summaries at the Alliance of Genome Resources
title_sort automated generation of gene summaries at the alliance of genome resources
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304461/
https://www.ncbi.nlm.nih.gov/pubmed/32559296
http://dx.doi.org/10.1093/database/baaa037
work_keys_str_mv AT kishoreranjana automatedgenerationofgenesummariesattheallianceofgenomeresources
AT arnaboldivalerio automatedgenerationofgenesummariesattheallianceofgenomeresources
AT vanslykecerie automatedgenerationofgenesummariesattheallianceofgenomeresources
AT chanjuancarlos automatedgenerationofgenesummariesattheallianceofgenomeresources
AT nashroberts automatedgenerationofgenesummariesattheallianceofgenomeresources
AT urbanojosem automatedgenerationofgenesummariesattheallianceofgenomeresources
AT dolanmarye automatedgenerationofgenesummariesattheallianceofgenomeresources
AT engelstaciar automatedgenerationofgenesummariesattheallianceofgenomeresources
AT shimoyamamary automatedgenerationofgenesummariesattheallianceofgenomeresources
AT sternbergpaulw automatedgenerationofgenesummariesattheallianceofgenomeresources
AT genomeresourcestheallianceof automatedgenerationofgenesummariesattheallianceofgenomeresources