Cargando…

A new statistical framework to assess structural alignment quality using information compression

Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of stru...

Descripción completa

Detalles Bibliográficos
Autores principales: Collier, James H., Allison, Lloyd, Lesk, Arthur M., Garcia de la Banda, Maria, Konagurthu, Arun S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147913/
https://www.ncbi.nlm.nih.gov/pubmed/25161241
http://dx.doi.org/10.1093/bioinformatics/btu460
_version_ 1782332536359223296
author Collier, James H.
Allison, Lloyd
Lesk, Arthur M.
Garcia de la Banda, Maria
Konagurthu, Arun S.
author_facet Collier, James H.
Allison, Lloyd
Lesk, Arthur M.
Garcia de la Banda, Maria
Konagurthu, Arun S.
author_sort Collier, James H.
collection PubMed
description Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. Results: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. Availability: http://lcb.infotech.monash.edu.au/I-value Contact: arun.konagurthu@monash.edu Supplementary information: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html
format Online
Article
Text
id pubmed-4147913
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41479132014-09-02 A new statistical framework to assess structural alignment quality using information compression Collier, James H. Allison, Lloyd Lesk, Arthur M. Garcia de la Banda, Maria Konagurthu, Arun S. Bioinformatics Eccb 2014 Proceedings Papers Committee Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. Results: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. Availability: http://lcb.infotech.monash.edu.au/I-value Contact: arun.konagurthu@monash.edu Supplementary information: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html Oxford University Press 2014-09-01 2014-08-22 /pmc/articles/PMC4147913/ /pubmed/25161241 http://dx.doi.org/10.1093/bioinformatics/btu460 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Eccb 2014 Proceedings Papers Committee
Collier, James H.
Allison, Lloyd
Lesk, Arthur M.
Garcia de la Banda, Maria
Konagurthu, Arun S.
A new statistical framework to assess structural alignment quality using information compression
title A new statistical framework to assess structural alignment quality using information compression
title_full A new statistical framework to assess structural alignment quality using information compression
title_fullStr A new statistical framework to assess structural alignment quality using information compression
title_full_unstemmed A new statistical framework to assess structural alignment quality using information compression
title_short A new statistical framework to assess structural alignment quality using information compression
title_sort new statistical framework to assess structural alignment quality using information compression
topic Eccb 2014 Proceedings Papers Committee
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147913/
https://www.ncbi.nlm.nih.gov/pubmed/25161241
http://dx.doi.org/10.1093/bioinformatics/btu460
work_keys_str_mv AT collierjamesh anewstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT allisonlloyd anewstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT leskarthurm anewstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT garciadelabandamaria anewstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT konagurthuaruns anewstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT collierjamesh newstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT allisonlloyd newstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT leskarthurm newstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT garciadelabandamaria newstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression
AT konagurthuaruns newstatisticalframeworktoassessstructuralalignmentqualityusinginformationcompression