Cargando…

A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins

BACKGROUND: The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been...

Descripción completa

Detalles Bibliográficos
Autores principales: Brick, Kevin, Pizzi, Elisabetta
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2408606/
https://www.ncbi.nlm.nih.gov/pubmed/18485187
http://dx.doi.org/10.1186/1471-2105-9-236
_version_ 1782155686506921984
author Brick, Kevin
Pizzi, Elisabetta
author_facet Brick, Kevin
Pizzi, Elisabetta
author_sort Brick, Kevin
collection PubMed
description BACKGROUND: The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased Plasmodium proteomes. RESULTS: We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of P. falciparum and P. yoelii proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two Plasmodium species and to classify members of the P. falciparum RIFIN/STEVOR family. CONCLUSION: We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.
format Text
id pubmed-2408606
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24086062008-06-02 A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins Brick, Kevin Pizzi, Elisabetta BMC Bioinformatics Research Article BACKGROUND: The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased Plasmodium proteomes. RESULTS: We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of P. falciparum and P. yoelii proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two Plasmodium species and to classify members of the P. falciparum RIFIN/STEVOR family. CONCLUSION: We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process. BioMed Central 2008-05-16 /pmc/articles/PMC2408606/ /pubmed/18485187 http://dx.doi.org/10.1186/1471-2105-9-236 Text en Copyright © 2008 Brick and Pizzi; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Brick, Kevin
Pizzi, Elisabetta
A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title_full A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title_fullStr A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title_full_unstemmed A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title_short A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
title_sort novel series of compositionally biased substitution matrices for comparing plasmodium proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2408606/
https://www.ncbi.nlm.nih.gov/pubmed/18485187
http://dx.doi.org/10.1186/1471-2105-9-236
work_keys_str_mv AT brickkevin anovelseriesofcompositionallybiasedsubstitutionmatricesforcomparingplasmodiumproteins
AT pizzielisabetta anovelseriesofcompositionallybiasedsubstitutionmatricesforcomparingplasmodiumproteins
AT brickkevin novelseriesofcompositionallybiasedsubstitutionmatricesforcomparingplasmodiumproteins
AT pizzielisabetta novelseriesofcompositionallybiasedsubstitutionmatricesforcomparingplasmodiumproteins