Cargando…

REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unf...

Descripción completa

Detalles Bibliográficos
Autores principales: Leonard, Guy, Stevens, Jamie R., Richards, Thomas A.
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2747128/
https://www.ncbi.nlm.nih.gov/pubmed/19812722
_version_ 1782172072453079040
author Leonard, Guy
Stevens, Jamie R.
Richards, Thomas A.
author_facet Leonard, Guy
Stevens, Jamie R.
Richards, Thomas A.
author_sort Leonard, Guy
collection PubMed
description The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends.
format Text
id pubmed-2747128
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-27471282009-10-06 REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era Leonard, Guy Stevens, Jamie R. Richards, Thomas A. Evol Bioinform Online Short Report The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. Libertas Academica 2009-05-06 /pmc/articles/PMC2747128/ /pubmed/19812722 Text en © the authors http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Short Report
Leonard, Guy
Stevens, Jamie R.
Richards, Thomas A.
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title_full REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title_fullStr REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title_full_unstemmed REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title_short REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era
title_sort refgen and treenamer: automated sequence data handling for phylogenetic analysis in the genomic era
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2747128/
https://www.ncbi.nlm.nih.gov/pubmed/19812722
work_keys_str_mv AT leonardguy refgenandtreenamerautomatedsequencedatahandlingforphylogeneticanalysisinthegenomicera
AT stevensjamier refgenandtreenamerautomatedsequencedatahandlingforphylogeneticanalysisinthegenomicera
AT richardsthomasa refgenandtreenamerautomatedsequencedatahandlingforphylogeneticanalysisinthegenomicera