Cargando…

Domain similarity based orthology detection

BACKGROUND: Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A cu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bitard-Feildel, Tristan, Kemena, Carsten, Greenwood, Jenny M, Bornberg-Bauer, Erich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443542/
https://www.ncbi.nlm.nih.gov/pubmed/25968113
http://dx.doi.org/10.1186/s12859-015-0570-8
_version_ 1782373005392871424
author Bitard-Feildel, Tristan
Kemena, Carsten
Greenwood, Jenny M
Bornberg-Bauer, Erich
author_facet Bitard-Feildel, Tristan
Kemena, Carsten
Greenwood, Jenny M
Bornberg-Bauer, Erich
author_sort Bitard-Feildel, Tristan
collection PubMed
description BACKGROUND: Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. RESULTS: We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. CONCLUSION: We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0570-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4443542
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44435422015-05-27 Domain similarity based orthology detection Bitard-Feildel, Tristan Kemena, Carsten Greenwood, Jenny M Bornberg-Bauer, Erich BMC Bioinformatics Methodology Article BACKGROUND: Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. RESULTS: We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. CONCLUSION: We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0570-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-13 /pmc/articles/PMC4443542/ /pubmed/25968113 http://dx.doi.org/10.1186/s12859-015-0570-8 Text en © Bitard-Feildelet al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bitard-Feildel, Tristan
Kemena, Carsten
Greenwood, Jenny M
Bornberg-Bauer, Erich
Domain similarity based orthology detection
title Domain similarity based orthology detection
title_full Domain similarity based orthology detection
title_fullStr Domain similarity based orthology detection
title_full_unstemmed Domain similarity based orthology detection
title_short Domain similarity based orthology detection
title_sort domain similarity based orthology detection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4443542/
https://www.ncbi.nlm.nih.gov/pubmed/25968113
http://dx.doi.org/10.1186/s12859-015-0570-8
work_keys_str_mv AT bitardfeildeltristan domainsimilaritybasedorthologydetection
AT kemenacarsten domainsimilaritybasedorthologydetection
AT greenwoodjennym domainsimilaritybasedorthologydetection
AT bornbergbauererich domainsimilaritybasedorthologydetection