Cargando…

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as “all-against-all”. As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottle...

Descripción completa

Detalles Bibliográficos
Autores principales: Wittwer, Lucas D., Piližota, Ivana, Altenhoff, Adrian M., Dessimoz, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4193403/
https://www.ncbi.nlm.nih.gov/pubmed/25320677
http://dx.doi.org/10.7717/peerj.607
_version_ 1782338967675338752
author Wittwer, Lucas D.
Piližota, Ivana
Altenhoff, Adrian M.
Dessimoz, Christophe
author_facet Wittwer, Lucas D.
Piližota, Ivana
Altenhoff, Adrian M.
Dessimoz, Christophe
author_sort Wittwer, Lucas D.
collection PubMed
description Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as “all-against-all”. As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3–14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.
format Online
Article
Text
id pubmed-4193403
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-41934032014-10-15 Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology Wittwer, Lucas D. Piližota, Ivana Altenhoff, Adrian M. Dessimoz, Christophe PeerJ Bioinformatics Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as “all-against-all”. As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3–14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone. PeerJ Inc. 2014-10-07 /pmc/articles/PMC4193403/ /pubmed/25320677 http://dx.doi.org/10.7717/peerj.607 Text en © 2014 Wittwer et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Wittwer, Lucas D.
Piližota, Ivana
Altenhoff, Adrian M.
Dessimoz, Christophe
Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title_full Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title_fullStr Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title_full_unstemmed Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title_short Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
title_sort speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4193403/
https://www.ncbi.nlm.nih.gov/pubmed/25320677
http://dx.doi.org/10.7717/peerj.607
work_keys_str_mv AT wittwerlucasd speedingupallagainstallproteincomparisonswhilemaintainingsensitivitybyconsideringsubsequencelevelhomology
AT pilizotaivana speedingupallagainstallproteincomparisonswhilemaintainingsensitivitybyconsideringsubsequencelevelhomology
AT altenhoffadrianm speedingupallagainstallproteincomparisonswhilemaintainingsensitivitybyconsideringsubsequencelevelhomology
AT dessimozchristophe speedingupallagainstallproteincomparisonswhilemaintainingsensitivitybyconsideringsubsequencelevelhomology