Cargando…

Quality measures for protein alignment benchmarks

Multiple protein sequence alignment methods are central to many applications in molecular biology. These methods are typically assessed on benchmark datasets including BALIBASE, OXBENCH, PREFAB and SABMARK, which are important to biologists in making informed choices between programs. In this articl...

Descripción completa

Detalles Bibliográficos
Autor principal: Edgar, Robert C.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853116/
https://www.ncbi.nlm.nih.gov/pubmed/20047958
http://dx.doi.org/10.1093/nar/gkp1196
_version_ 1782180012087050240
author Edgar, Robert C.
author_facet Edgar, Robert C.
author_sort Edgar, Robert C.
collection PubMed
description Multiple protein sequence alignment methods are central to many applications in molecular biology. These methods are typically assessed on benchmark datasets including BALIBASE, OXBENCH, PREFAB and SABMARK, which are important to biologists in making informed choices between programs. In this article, annotations of domain homology and secondary structure are used to define new measures of alignment quality and are used to make the first systematic, independent evaluation of these benchmarks. These measures indicate sensitivity and specificity while avoiding the ambiguous residue correspondences and arbitrary distance cutoffs inherent to structural superpositions. Alignments by selected methods that indicate high-confidence columns (ALIGN-M, DIALIGN-T, FSA and MUSCLE) are also assessed. Fold space coverage and effective benchmark database sizes are estimated by reference to domain annotations, and significant redundancy is found in all benchmarks except SABMARK. Questionable alignments are found in all benchmarks, especially in BALIBASE where 87% of sequences have unknown structure, 20% of columns contain different folds according to SUPERFAMILY and 30% of ‘core block’ columns have conflicting secondary structure according to DSSP. A careful analysis of current protein multiple alignment benchmarks calls into question their ability to determine reliable algorithm rankings.
format Text
id pubmed-2853116
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28531162010-04-12 Quality measures for protein alignment benchmarks Edgar, Robert C. Nucleic Acids Res Computational Biology Multiple protein sequence alignment methods are central to many applications in molecular biology. These methods are typically assessed on benchmark datasets including BALIBASE, OXBENCH, PREFAB and SABMARK, which are important to biologists in making informed choices between programs. In this article, annotations of domain homology and secondary structure are used to define new measures of alignment quality and are used to make the first systematic, independent evaluation of these benchmarks. These measures indicate sensitivity and specificity while avoiding the ambiguous residue correspondences and arbitrary distance cutoffs inherent to structural superpositions. Alignments by selected methods that indicate high-confidence columns (ALIGN-M, DIALIGN-T, FSA and MUSCLE) are also assessed. Fold space coverage and effective benchmark database sizes are estimated by reference to domain annotations, and significant redundancy is found in all benchmarks except SABMARK. Questionable alignments are found in all benchmarks, especially in BALIBASE where 87% of sequences have unknown structure, 20% of columns contain different folds according to SUPERFAMILY and 30% of ‘core block’ columns have conflicting secondary structure according to DSSP. A careful analysis of current protein multiple alignment benchmarks calls into question their ability to determine reliable algorithm rankings. Oxford University Press 2010-04 2010-01-04 /pmc/articles/PMC2853116/ /pubmed/20047958 http://dx.doi.org/10.1093/nar/gkp1196 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Edgar, Robert C.
Quality measures for protein alignment benchmarks
title Quality measures for protein alignment benchmarks
title_full Quality measures for protein alignment benchmarks
title_fullStr Quality measures for protein alignment benchmarks
title_full_unstemmed Quality measures for protein alignment benchmarks
title_short Quality measures for protein alignment benchmarks
title_sort quality measures for protein alignment benchmarks
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853116/
https://www.ncbi.nlm.nih.gov/pubmed/20047958
http://dx.doi.org/10.1093/nar/gkp1196
work_keys_str_mv AT edgarrobertc qualitymeasuresforproteinalignmentbenchmarks