Cargando…

FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies

BACKGROUND: Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have exp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kück, Patrick, Longo, Gary C
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243772/ https://www.ncbi.nlm.nih.gov/pubmed/25426157 http://dx.doi.org/10.1186/s12983-014-0081-x

_version_	1782346144899137536
author	Kück, Patrick Longo, Gary C
author_facet	Kück, Patrick Longo, Gary C
author_sort	Kück, Patrick
collection	PubMed
description	BACKGROUND: Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run. RESULTS: Here we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings. CONCLUSIONS: FASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into analysis pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12983-014-0081-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4243772
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42437722014-11-26 FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies Kück, Patrick Longo, Gary C Front Zool Methodology BACKGROUND: Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run. RESULTS: Here we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings. CONCLUSIONS: FASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into analysis pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12983-014-0081-x) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-18 /pmc/articles/PMC4243772/ /pubmed/25426157 http://dx.doi.org/10.1186/s12983-014-0081-x Text en © Kück and Longo; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Kück, Patrick Longo, Gary C FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title	FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title_full	FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title_fullStr	FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title_full_unstemmed	FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title_short	FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
title_sort	fasconcat-g: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243772/ https://www.ncbi.nlm.nih.gov/pubmed/25426157 http://dx.doi.org/10.1186/s12983-014-0081-x
work_keys_str_mv	AT kuckpatrick fasconcatgextensivefunctionsformultiplesequencealignmentpreparationsconcerningphylogeneticstudies AT longogaryc fasconcatgextensivefunctionsformultiplesequencealignmentpreparationsconcerningphylogeneticstudies

FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies

Ejemplares similares