Cargando…

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yip, Kevin Y., Gerstein, Mark
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2009
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639005/ https://www.ncbi.nlm.nih.gov/pubmed/19015141 http://dx.doi.org/10.1093/bioinformatics/btn602

_version_	1782164434022563840
author	Yip, Kevin Y. Gerstein, Mark
author_facet	Yip, Kevin Y. Gerstein, Mark
author_sort	Yip, Kevin Y.
collection	PubMed
description	Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few. Results: To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly confident auxiliary examples. The first method, prediction propagation, propagates highly confident predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein–protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins. Contact: mark.gerstein@yale.edu Availability: The datasets and additional materials can be found at http://networks.gersteinlab.org/tse.
format	Text
id	pubmed-2639005
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-26390052009-02-25 Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions Yip, Kevin Y. Gerstein, Mark Bioinformatics Original Papers Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few. Results: To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly confident auxiliary examples. The first method, prediction propagation, propagates highly confident predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein–protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins. Contact: mark.gerstein@yale.edu Availability: The datasets and additional materials can be found at http://networks.gersteinlab.org/tse. Oxford University Press 2009-01-15 2008-11-17 /pmc/articles/PMC2639005/ /pubmed/19015141 http://dx.doi.org/10.1093/bioinformatics/btn602 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Yip, Kevin Y. Gerstein, Mark Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title	Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title_full	Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title_fullStr	Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title_full_unstemmed	Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title_short	Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
title_sort	training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639005/ https://www.ncbi.nlm.nih.gov/pubmed/19015141 http://dx.doi.org/10.1093/bioinformatics/btn602
work_keys_str_mv	AT yipkeviny trainingsetexpansionanapproachtoimprovingthereconstructionofbiologicalnetworksfromlimitedandunevenreliableinteractions AT gersteinmark trainingsetexpansionanapproachtoimprovingthereconstructionofbiologicalnetworksfromlimitedandunevenreliableinteractions

Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions

Ejemplares similares