Cargando…

Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences

Word-based or ‘alignment-free’ sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances...

Descripción completa

Detalles Bibliográficos
Autores principales: Leimeister, Chris-Andre, Schellhorn, Jendrik, Dörrer, Svenja, Gerth, Michael, Bleidorn, Christoph, Morgenstern, Burkhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436989/
https://www.ncbi.nlm.nih.gov/pubmed/30535314
http://dx.doi.org/10.1093/gigascience/giy148
_version_ 1783406876215476224
author Leimeister, Chris-Andre
Schellhorn, Jendrik
Dörrer, Svenja
Gerth, Michael
Bleidorn, Christoph
Morgenstern, Burkhard
author_facet Leimeister, Chris-Andre
Schellhorn, Jendrik
Dörrer, Svenja
Gerth, Michael
Bleidorn, Christoph
Morgenstern, Burkhard
author_sort Leimeister, Chris-Andre
collection PubMed
description Word-based or ‘alignment-free’ sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM.
format Online
Article
Text
id pubmed-6436989
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64369892019-04-01 Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences Leimeister, Chris-Andre Schellhorn, Jendrik Dörrer, Svenja Gerth, Michael Bleidorn, Christoph Morgenstern, Burkhard Gigascience Research Word-based or ‘alignment-free’ sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM. Oxford University Press 2018-12-07 /pmc/articles/PMC6436989/ /pubmed/30535314 http://dx.doi.org/10.1093/gigascience/giy148 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Leimeister, Chris-Andre
Schellhorn, Jendrik
Dörrer, Svenja
Gerth, Michael
Bleidorn, Christoph
Morgenstern, Burkhard
Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title_full Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title_fullStr Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title_full_unstemmed Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title_short Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
title_sort prot-spam: fast alignment-free phylogeny reconstruction based on whole-proteome sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6436989/
https://www.ncbi.nlm.nih.gov/pubmed/30535314
http://dx.doi.org/10.1093/gigascience/giy148
work_keys_str_mv AT leimeisterchrisandre protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences
AT schellhornjendrik protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences
AT dorrersvenja protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences
AT gerthmichael protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences
AT bleidornchristoph protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences
AT morgensternburkhard protspamfastalignmentfreephylogenyreconstructionbasedonwholeproteomesequences