Cargando…

SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics

BACKGROUND: Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (para...

Descripción completa

Detalles Bibliográficos
Autores principales: Roure, Béatrice, Rodriguez-Ezpeleta, Naiara, Philippe, Hervé
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796611/
https://www.ncbi.nlm.nih.gov/pubmed/17288575
http://dx.doi.org/10.1186/1471-2148-7-S1-S2
_version_ 1782132244200030208
author Roure, Béatrice
Rodriguez-Ezpeleta, Naiara
Philippe, Hervé
author_facet Roure, Béatrice
Rodriguez-Ezpeleta, Naiara
Philippe, Hervé
author_sort Roure, Béatrice
collection PubMed
description BACKGROUND: Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context. RESULTS: Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise. CONCLUSION: SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
format Text
id pubmed-1796611
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17966112007-02-09 SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics Roure, Béatrice Rodriguez-Ezpeleta, Naiara Philippe, Hervé BMC Evol Biol Software BACKGROUND: Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context. RESULTS: Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise. CONCLUSION: SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species. BioMed Central 2007-02-08 /pmc/articles/PMC1796611/ /pubmed/17288575 http://dx.doi.org/10.1186/1471-2148-7-S1-S2 Text en Copyright © 2007 Roure et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Roure, Béatrice
Rodriguez-Ezpeleta, Naiara
Philippe, Hervé
SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title_full SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title_fullStr SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title_full_unstemmed SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title_short SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics
title_sort scafos: a tool for selection, concatenation and fusion of sequences for phylogenomics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1796611/
https://www.ncbi.nlm.nih.gov/pubmed/17288575
http://dx.doi.org/10.1186/1471-2148-7-S1-S2
work_keys_str_mv AT rourebeatrice scafosatoolforselectionconcatenationandfusionofsequencesforphylogenomics
AT rodriguezezpeletanaiara scafosatoolforselectionconcatenationandfusionofsequencesforphylogenomics
AT philippeherve scafosatoolforselectionconcatenationandfusionofsequencesforphylogenomics