Cargando…

Fair evaluation of global network aligners

BACKGROUND: Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two...

Descripción completa

Detalles Bibliográficos
Autores principales: Crawford, Joseph, Sun, Yihan, Milenković, Tijana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460690/
https://www.ncbi.nlm.nih.gov/pubmed/26060505
http://dx.doi.org/10.1186/s13015-015-0050-8
_version_ 1782375415790174208
author Crawford, Joseph
Sun, Yihan
Milenković, Tijana
author_facet Crawford, Joseph
Sun, Yihan
Milenković, Tijana
author_sort Crawford, Joseph
collection PubMed
description BACKGROUND: Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods’ NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS: Our findings are as follows. MI-GRAAL’s NCF is superior to GHOST’s NCF, while the performance of the methods’ ASs is data-dependent. Thus, for data on which GHOST’s AS is superior to MI-GRAAL’s AS, the combination of MI-GRAAL’s NCF and GHOST’s AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION: Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-015-0050-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4460690
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44606902015-06-10 Fair evaluation of global network aligners Crawford, Joseph Sun, Yihan Milenković, Tijana Algorithms Mol Biol Research BACKGROUND: Analogous to genomic sequence alignment, biological network alignment identifies conserved regions between networks of different species. Then, function can be transferred from well- to poorly-annotated species between aligned network regions. Network alignment typically encompasses two algorithmic components: node cost function (NCF), which measures similarities between nodes in different networks, and alignment strategy (AS), which uses these similarities to rapidly identify high-scoring alignments. Different methods use both different NCFs and different ASs. Thus, it is unclear whether the superiority of a method comes from its NCF, its AS, or both. We already showed on state-of-the-art methods, MI-GRAAL and IsoRankN, that combining NCF of one method and AS of another method can give a new superior method. Here, we evaluate MI-GRAAL against a newer approach, GHOST, by mixing-and-matching the methods’ NCFs and ASs to potentially further improve alignment quality. While doing so, we approach important questions that have not been asked systematically thus far. First, we ask how much of the NCF information should come from protein sequence data compared to network topology data. Existing methods determine this parameter more-less arbitrarily, which could affect alignment quality. Second, when topological information is used in NCF, we ask how large the size of the neighborhoods of the compared nodes should be. Existing methods assume that the larger the neighborhood size, the better. RESULTS: Our findings are as follows. MI-GRAAL’s NCF is superior to GHOST’s NCF, while the performance of the methods’ ASs is data-dependent. Thus, for data on which GHOST’s AS is superior to MI-GRAAL’s AS, the combination of MI-GRAAL’s NCF and GHOST’s AS represents a new superior method. Also, which amount of sequence information is used within NCF does not affect alignment quality, while the inclusion of topological information is crucial for producing good alignments. Finally, larger neighborhood sizes are preferred, but often, it is the second largest size that is superior. Using this size instead of the largest one would decrease computational complexity. CONCLUSION: Taken together, our results represent general recommendations for a fair evaluation of network alignment methods and in particular of two-stage NCF-AS approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-015-0050-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-09 /pmc/articles/PMC4460690/ /pubmed/26060505 http://dx.doi.org/10.1186/s13015-015-0050-8 Text en © Crawford et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Crawford, Joseph
Sun, Yihan
Milenković, Tijana
Fair evaluation of global network aligners
title Fair evaluation of global network aligners
title_full Fair evaluation of global network aligners
title_fullStr Fair evaluation of global network aligners
title_full_unstemmed Fair evaluation of global network aligners
title_short Fair evaluation of global network aligners
title_sort fair evaluation of global network aligners
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460690/
https://www.ncbi.nlm.nih.gov/pubmed/26060505
http://dx.doi.org/10.1186/s13015-015-0050-8
work_keys_str_mv AT crawfordjoseph fairevaluationofglobalnetworkaligners
AT sunyihan fairevaluationofglobalnetworkaligners
AT milenkovictijana fairevaluationofglobalnetworkaligners