Cargando…

High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH

BACKGROUND: Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable fro...

Descripción completa

Detalles Bibliográficos
Autores principales: Teichert, Florian, Minning, Jonas, Bastolla, Ugo, Porto, Markus
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885375/
https://www.ncbi.nlm.nih.gov/pubmed/20470364
http://dx.doi.org/10.1186/1471-2105-11-251
_version_ 1782182381426311168
author Teichert, Florian
Minning, Jonas
Bastolla, Ugo
Porto, Markus
author_facet Teichert, Florian
Minning, Jonas
Bastolla, Ugo
Porto, Markus
author_sort Teichert, Florian
collection PubMed
description BACKGROUND: Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable from random. For such distant pairs, structure alignment is of much better quality. Nevertheless, sequence alignment is the only choice in the majority of cases where structural data is not available. This situation demands development of methods that extend the applicability of accurate sequence alignment to distantly related proteins. RESULTS: We develop a sequence alignment method that combines the prediction of a structural profile based on the protein's sequence with the alignment of that profile using our recently published alignment tool SABERTOOTH. In particular, we predict the contact vector of protein structures using an artificial neural network based on position-specific scoring matrices generated by PSI-BLAST and align these predicted contact vectors. The resulting sequence alignments are assessed using two different tests: First, we assess the alignment quality by measuring the derived structural similarity for cases in which structures are available. In a second test, we quantify the ability of the significance score of the alignments to recognize structural and evolutionary relationships. As a benchmark we use a representative set of the SCOP (structural classification of proteins) database, with similarities ranging from closely related proteins at SCOP family level, to very distantly related proteins at SCOP fold level. Comparing these results with some prominent sequence alignment tools, we find that SABERTOOTH produces sequence alignments of better quality than those of Clustal W, T-Coffee, MUSCLE, and PSI-BLAST. HHpred, one of the most sophisticated and computationally expensive tools available, outperforms our alignment algorithm at family and superfamily levels, while the use of SABERTOOTH is advantageous for alignments at fold level. Our alignment scheme will profit from future improvements of structural profiles prediction. CONCLUSIONS: We present the automatic sequence alignment tool SABERTOOTH that computes pairwise sequence alignments of very high quality. SABERTOOTH is especially advantageous when applied to alignments of remotely related proteins. The source code is available at http://www.fkp.tu-darmstadt.de/sabertooth_project/, free for academic users upon request.
format Text
id pubmed-2885375
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28853752010-06-15 High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH Teichert, Florian Minning, Jonas Bastolla, Ugo Porto, Markus BMC Bioinformatics Research article BACKGROUND: Protein alignments are an essential tool for many bioinformatics analyses. While sequence alignments are accurate for proteins of high sequence similarity, they become unreliable as they approach the so-called 'twilight zone' where sequence similarity gets indistinguishable from random. For such distant pairs, structure alignment is of much better quality. Nevertheless, sequence alignment is the only choice in the majority of cases where structural data is not available. This situation demands development of methods that extend the applicability of accurate sequence alignment to distantly related proteins. RESULTS: We develop a sequence alignment method that combines the prediction of a structural profile based on the protein's sequence with the alignment of that profile using our recently published alignment tool SABERTOOTH. In particular, we predict the contact vector of protein structures using an artificial neural network based on position-specific scoring matrices generated by PSI-BLAST and align these predicted contact vectors. The resulting sequence alignments are assessed using two different tests: First, we assess the alignment quality by measuring the derived structural similarity for cases in which structures are available. In a second test, we quantify the ability of the significance score of the alignments to recognize structural and evolutionary relationships. As a benchmark we use a representative set of the SCOP (structural classification of proteins) database, with similarities ranging from closely related proteins at SCOP family level, to very distantly related proteins at SCOP fold level. Comparing these results with some prominent sequence alignment tools, we find that SABERTOOTH produces sequence alignments of better quality than those of Clustal W, T-Coffee, MUSCLE, and PSI-BLAST. HHpred, one of the most sophisticated and computationally expensive tools available, outperforms our alignment algorithm at family and superfamily levels, while the use of SABERTOOTH is advantageous for alignments at fold level. Our alignment scheme will profit from future improvements of structural profiles prediction. CONCLUSIONS: We present the automatic sequence alignment tool SABERTOOTH that computes pairwise sequence alignments of very high quality. SABERTOOTH is especially advantageous when applied to alignments of remotely related proteins. The source code is available at http://www.fkp.tu-darmstadt.de/sabertooth_project/, free for academic users upon request. BioMed Central 2010-05-14 /pmc/articles/PMC2885375/ /pubmed/20470364 http://dx.doi.org/10.1186/1471-2105-11-251 Text en Copyright ©2010 Teichert et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Teichert, Florian
Minning, Jonas
Bastolla, Ugo
Porto, Markus
High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title_full High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title_fullStr High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title_full_unstemmed High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title_short High quality protein sequence alignment by combining structural profile prediction and profile alignment using SABERTOOTH
title_sort high quality protein sequence alignment by combining structural profile prediction and profile alignment using sabertooth
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885375/
https://www.ncbi.nlm.nih.gov/pubmed/20470364
http://dx.doi.org/10.1186/1471-2105-11-251
work_keys_str_mv AT teichertflorian highqualityproteinsequencealignmentbycombiningstructuralprofilepredictionandprofilealignmentusingsabertooth
AT minningjonas highqualityproteinsequencealignmentbycombiningstructuralprofilepredictionandprofilealignmentusingsabertooth
AT bastollaugo highqualityproteinsequencealignmentbycombiningstructuralprofilepredictionandprofilealignmentusingsabertooth
AT portomarkus highqualityproteinsequencealignmentbycombiningstructuralprofilepredictionandprofilealignmentusingsabertooth