Cargando…
Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction
BACKGROUND: Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterize...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006453/ https://www.ncbi.nlm.nih.gov/pubmed/24843788 http://dx.doi.org/10.1186/2047-217X-3-5 |
_version_ | 1782314218143350784 |
---|---|
author | Mesiti, Marco Re, Matteo Valentini, Giorgio |
author_facet | Mesiti, Marco Re, Matteo Valentini, Giorgio |
author_sort | Mesiti, Marco |
collection | PubMed |
description | BACKGROUND: Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. RESULTS: We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. CONCLUSIONS: The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines. |
format | Online Article Text |
id | pubmed-4006453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40064532014-05-19 Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction Mesiti, Marco Re, Matteo Valentini, Giorgio Gigascience Research BACKGROUND: Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. RESULTS: We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. CONCLUSIONS: The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines. BioMed Central 2014-04-23 /pmc/articles/PMC4006453/ /pubmed/24843788 http://dx.doi.org/10.1186/2047-217X-3-5 Text en Copyright © 2014 Mesiti et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Mesiti, Marco Re, Matteo Valentini, Giorgio Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title | Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title_full | Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title_fullStr | Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title_full_unstemmed | Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title_short | Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
title_sort | think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4006453/ https://www.ncbi.nlm.nih.gov/pubmed/24843788 http://dx.doi.org/10.1186/2047-217X-3-5 |
work_keys_str_mv | AT mesitimarco thinkgloballyandsolvelocallysecondarymemorybasednetworklearningforautomatedmultispeciesfunctionprediction AT rematteo thinkgloballyandsolvelocallysecondarymemorybasednetworklearningforautomatedmultispeciesfunctionprediction AT valentinigiorgio thinkgloballyandsolvelocallysecondarymemorybasednetworklearningforautomatedmultispeciesfunctionprediction |