Cargando…

GeneRIF indexing: sentence selection based on machine learning

BACKGROUND: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support crea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jimeno-Yepes, Antonio J, Sticco, J Caitlin, Mork, James G, Aronson, Alan R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687823/ https://www.ncbi.nlm.nih.gov/pubmed/23725347 http://dx.doi.org/10.1186/1471-2105-14-171

_version_	1782273991848755200
author	Jimeno-Yepes, Antonio J Sticco, J Caitlin Mork, James G Aronson, Alan R
author_facet	Jimeno-Yepes, Antonio J Sticco, J Caitlin Mork, James G Aronson, Alan R
author_sort	Jimeno-Yepes, Antonio J
collection	PubMed
description	BACKGROUND: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE(®;) citations and the sentences describing a novel function. RESULTS: We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. CONCLUSIONS: The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.
format	Online Article Text
id	pubmed-3687823
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36878232013-06-21 GeneRIF indexing: sentence selection based on machine learning Jimeno-Yepes, Antonio J Sticco, J Caitlin Mork, James G Aronson, Alan R BMC Bioinformatics Research Article BACKGROUND: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE(®;) citations and the sentences describing a novel function. RESULTS: We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. CONCLUSIONS: The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species. BioMed Central 2013-05-31 /pmc/articles/PMC3687823/ /pubmed/23725347 http://dx.doi.org/10.1186/1471-2105-14-171 Text en Copyright © 2013 Jimeno-Yepes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jimeno-Yepes, Antonio J Sticco, J Caitlin Mork, James G Aronson, Alan R GeneRIF indexing: sentence selection based on machine learning
title	GeneRIF indexing: sentence selection based on machine learning
title_full	GeneRIF indexing: sentence selection based on machine learning
title_fullStr	GeneRIF indexing: sentence selection based on machine learning
title_full_unstemmed	GeneRIF indexing: sentence selection based on machine learning
title_short	GeneRIF indexing: sentence selection based on machine learning
title_sort	generif indexing: sentence selection based on machine learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3687823/ https://www.ncbi.nlm.nih.gov/pubmed/23725347 http://dx.doi.org/10.1186/1471-2105-14-171
work_keys_str_mv	AT jimenoyepesantonioj generifindexingsentenceselectionbasedonmachinelearning AT sticcojcaitlin generifindexingsentenceselectionbasedonmachinelearning AT morkjamesg generifindexingsentenceselectionbasedonmachinelearning AT aronsonalanr generifindexingsentenceselectionbasedonmachinelearning

GeneRIF indexing: sentence selection based on machine learning

Ejemplares similares