Cargando…

Discovering semantic features in the literature: a foundation for building functional associations

BACKGROUND: Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to anal...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chagoyen, Monica, Carmona-Saez, Pedro, Shatkay, Hagit, Carazo, Jose M, Pascual-Montano, Alberto
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1386711/ https://www.ncbi.nlm.nih.gov/pubmed/16438716 http://dx.doi.org/10.1186/1471-2105-7-41

_version_	1782126882765930496
author	Chagoyen, Monica Carmona-Saez, Pedro Shatkay, Hagit Carazo, Jose M Pascual-Montano, Alberto
author_facet	Chagoyen, Monica Carmona-Saez, Pedro Shatkay, Hagit Carazo, Jose M Pascual-Montano, Alberto
author_sort	Chagoyen, Monica
collection	PubMed
description	BACKGROUND: Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. RESULTS: We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. CONCLUSION: The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data.
format	Text
id	pubmed-1386711
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-13867112006-04-21 Discovering semantic features in the literature: a foundation for building functional associations Chagoyen, Monica Carmona-Saez, Pedro Shatkay, Hagit Carazo, Jose M Pascual-Montano, Alberto BMC Bioinformatics Methodology Article BACKGROUND: Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. RESULTS: We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. CONCLUSION: The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. BioMed Central 2006-01-26 /pmc/articles/PMC1386711/ /pubmed/16438716 http://dx.doi.org/10.1186/1471-2105-7-41 Text en Copyright © 2006 Chagoyen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Chagoyen, Monica Carmona-Saez, Pedro Shatkay, Hagit Carazo, Jose M Pascual-Montano, Alberto Discovering semantic features in the literature: a foundation for building functional associations
title	Discovering semantic features in the literature: a foundation for building functional associations
title_full	Discovering semantic features in the literature: a foundation for building functional associations
title_fullStr	Discovering semantic features in the literature: a foundation for building functional associations
title_full_unstemmed	Discovering semantic features in the literature: a foundation for building functional associations
title_short	Discovering semantic features in the literature: a foundation for building functional associations
title_sort	discovering semantic features in the literature: a foundation for building functional associations
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1386711/ https://www.ncbi.nlm.nih.gov/pubmed/16438716 http://dx.doi.org/10.1186/1471-2105-7-41
work_keys_str_mv	AT chagoyenmonica discoveringsemanticfeaturesintheliteratureafoundationforbuildingfunctionalassociations AT carmonasaezpedro discoveringsemanticfeaturesintheliteratureafoundationforbuildingfunctionalassociations AT shatkayhagit discoveringsemanticfeaturesintheliteratureafoundationforbuildingfunctionalassociations AT carazojosem discoveringsemanticfeaturesintheliteratureafoundationforbuildingfunctionalassociations AT pascualmontanoalberto discoveringsemanticfeaturesintheliteratureafoundationforbuildingfunctionalassociations

Discovering semantic features in the literature: a foundation for building functional associations

Ejemplares similares