Cargando…

GASP: Gapped Ancestral Sequence Prediction for proteins

BACKGROUND: The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or M...

Descripción completa

Detalles Bibliográficos
Autores principales: Edwards, Richard J, Shields, Denis C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517926/
https://www.ncbi.nlm.nih.gov/pubmed/15350199
http://dx.doi.org/10.1186/1471-2105-5-123
_version_ 1782121794587590656
author Edwards, Richard J
Shields, Denis C
author_facet Edwards, Richard J
Shields, Denis C
author_sort Edwards, Richard J
collection PubMed
description BACKGROUND: The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. RESULTS: Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. CONCLUSIONS: GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.
format Text
id pubmed-517926
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5179262004-09-24 GASP: Gapped Ancestral Sequence Prediction for proteins Edwards, Richard J Shields, Denis C BMC Bioinformatics Software BACKGROUND: The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. RESULTS: Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. CONCLUSIONS: GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. BioMed Central 2004-09-06 /pmc/articles/PMC517926/ /pubmed/15350199 http://dx.doi.org/10.1186/1471-2105-5-123 Text en Copyright © 2004 Edwards and Shields; licensee BioMed Central Ltd.
spellingShingle Software
Edwards, Richard J
Shields, Denis C
GASP: Gapped Ancestral Sequence Prediction for proteins
title GASP: Gapped Ancestral Sequence Prediction for proteins
title_full GASP: Gapped Ancestral Sequence Prediction for proteins
title_fullStr GASP: Gapped Ancestral Sequence Prediction for proteins
title_full_unstemmed GASP: Gapped Ancestral Sequence Prediction for proteins
title_short GASP: Gapped Ancestral Sequence Prediction for proteins
title_sort gasp: gapped ancestral sequence prediction for proteins
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517926/
https://www.ncbi.nlm.nih.gov/pubmed/15350199
http://dx.doi.org/10.1186/1471-2105-5-123
work_keys_str_mv AT edwardsrichardj gaspgappedancestralsequencepredictionforproteins
AT shieldsdenisc gaspgappedancestralsequencepredictionforproteins