Cargando…
Automated generation of gene summaries at the Alliance of Genome Resources
Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304461/ https://www.ncbi.nlm.nih.gov/pubmed/32559296 http://dx.doi.org/10.1093/database/baaa037 |
_version_ | 1783548268910739456 |
---|---|
author | Kishore, Ranjana Arnaboldi, Valerio Van Slyke, Ceri E Chan, Juancarlos Nash, Robert S Urbano, Jose M Dolan, Mary E Engel, Stacia R Shimoyama, Mary Sternberg, Paul W Genome Resources, the Alliance of |
author_facet | Kishore, Ranjana Arnaboldi, Valerio Van Slyke, Ceri E Chan, Juancarlos Nash, Robert S Urbano, Jose M Dolan, Mary E Engel, Stacia R Shimoyama, Mary Sternberg, Paul W Genome Resources, the Alliance of |
author_sort | Kishore, Ranjana |
collection | PubMed |
description | Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages. |
format | Online Article Text |
id | pubmed-7304461 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73044612020-06-24 Automated generation of gene summaries at the Alliance of Genome Resources Kishore, Ranjana Arnaboldi, Valerio Van Slyke, Ceri E Chan, Juancarlos Nash, Robert S Urbano, Jose M Dolan, Mary E Engel, Stacia R Shimoyama, Mary Sternberg, Paul W Genome Resources, the Alliance of Database (Oxford) Original Article Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages. Oxford University Press 2020-06-19 /pmc/articles/PMC7304461/ /pubmed/32559296 http://dx.doi.org/10.1093/database/baaa037 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Kishore, Ranjana Arnaboldi, Valerio Van Slyke, Ceri E Chan, Juancarlos Nash, Robert S Urbano, Jose M Dolan, Mary E Engel, Stacia R Shimoyama, Mary Sternberg, Paul W Genome Resources, the Alliance of Automated generation of gene summaries at the Alliance of Genome Resources |
title | Automated generation of gene summaries at the Alliance of Genome Resources |
title_full | Automated generation of gene summaries at the Alliance of Genome Resources |
title_fullStr | Automated generation of gene summaries at the Alliance of Genome Resources |
title_full_unstemmed | Automated generation of gene summaries at the Alliance of Genome Resources |
title_short | Automated generation of gene summaries at the Alliance of Genome Resources |
title_sort | automated generation of gene summaries at the alliance of genome resources |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304461/ https://www.ncbi.nlm.nih.gov/pubmed/32559296 http://dx.doi.org/10.1093/database/baaa037 |
work_keys_str_mv | AT kishoreranjana automatedgenerationofgenesummariesattheallianceofgenomeresources AT arnaboldivalerio automatedgenerationofgenesummariesattheallianceofgenomeresources AT vanslykecerie automatedgenerationofgenesummariesattheallianceofgenomeresources AT chanjuancarlos automatedgenerationofgenesummariesattheallianceofgenomeresources AT nashroberts automatedgenerationofgenesummariesattheallianceofgenomeresources AT urbanojosem automatedgenerationofgenesummariesattheallianceofgenomeresources AT dolanmarye automatedgenerationofgenesummariesattheallianceofgenomeresources AT engelstaciar automatedgenerationofgenesummariesattheallianceofgenomeresources AT shimoyamamary automatedgenerationofgenesummariesattheallianceofgenomeresources AT sternbergpaulw automatedgenerationofgenesummariesattheallianceofgenomeresources AT genomeresourcestheallianceof automatedgenerationofgenesummariesattheallianceofgenomeresources |