Cargando…

Ordering Protein Contact Matrices

Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Chuan, Bouvier, Guillaume, Bardiaux, Benjamin, Nilges, Michael, Malliavin, Thérèse, Lisser, Abdel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889711/ https://www.ncbi.nlm.nih.gov/pubmed/29632657 http://dx.doi.org/10.1016/j.csbj.2018.03.001

_version_	1783312745334046720
author	Xu, Chuan Bouvier, Guillaume Bardiaux, Benjamin Nilges, Michael Malliavin, Thérèse Lisser, Abdel
author_facet	Xu, Chuan Bouvier, Guillaume Bardiaux, Benjamin Nilges, Michael Malliavin, Thérèse Lisser, Abdel
author_sort	Xu, Chuan
collection	PubMed
description	Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here, we propose an algorithm to assign such residue numbers by ordering the columns and lines of the raw protein contact matrix directly obtained from proximity information between unassigned amino acids. The ordering problem is formatted as the search of a trail within a graph connecting protein residues through the nonzero contact values. The algorithm performs in two steps: (i) finding the longest trail of the graph using an original dynamic programming algorithm, (ii) clustering the individual ordered matrices using a self-organizing map (SOM) approach. The combination of the dynamic programming and self-organizing map approaches constitutes a quite innovative point of the present work. The algorithm was validated on a set of about 900 proteins, representative of the sizes and proportions of secondary structures observed in the Protein Data Bank. The algorithm was revealed to be efficient for noise levels up to 40%, obtaining average gaps of about 20% at maximum between ordered and initial matrices. The proposed approach paves the ways toward a method of fold prediction from noisy proximity information, as TM scores larger than 0.5 have been obtained for ten randomly chosen proteins, in the case of a noise level of 10%. The methods has been also validated on two experimental cases, on which it performed satisfactorily.
format	Online Article Text
id	pubmed-5889711
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-58897112018-04-09 Ordering Protein Contact Matrices Xu, Chuan Bouvier, Guillaume Bardiaux, Benjamin Nilges, Michael Malliavin, Thérèse Lisser, Abdel Comput Struct Biotechnol J Research Article Numerous biophysical approaches provide information about residues spatial proximity in proteins. However, correct assignment of the protein fold from this proximity information is not straightforward if the spatially close protein residues are not assigned to residues in the primary sequence. Here, we propose an algorithm to assign such residue numbers by ordering the columns and lines of the raw protein contact matrix directly obtained from proximity information between unassigned amino acids. The ordering problem is formatted as the search of a trail within a graph connecting protein residues through the nonzero contact values. The algorithm performs in two steps: (i) finding the longest trail of the graph using an original dynamic programming algorithm, (ii) clustering the individual ordered matrices using a self-organizing map (SOM) approach. The combination of the dynamic programming and self-organizing map approaches constitutes a quite innovative point of the present work. The algorithm was validated on a set of about 900 proteins, representative of the sizes and proportions of secondary structures observed in the Protein Data Bank. The algorithm was revealed to be efficient for noise levels up to 40%, obtaining average gaps of about 20% at maximum between ordered and initial matrices. The proposed approach paves the ways toward a method of fold prediction from noisy proximity information, as TM scores larger than 0.5 have been obtained for ten randomly chosen proteins, in the case of a noise level of 10%. The methods has been also validated on two experimental cases, on which it performed satisfactorily. Research Network of Computational and Structural Biotechnology 2018-03-16 /pmc/articles/PMC5889711/ /pubmed/29632657 http://dx.doi.org/10.1016/j.csbj.2018.03.001 Text en © 2018 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Xu, Chuan Bouvier, Guillaume Bardiaux, Benjamin Nilges, Michael Malliavin, Thérèse Lisser, Abdel Ordering Protein Contact Matrices
title	Ordering Protein Contact Matrices
title_full	Ordering Protein Contact Matrices
title_fullStr	Ordering Protein Contact Matrices
title_full_unstemmed	Ordering Protein Contact Matrices
title_short	Ordering Protein Contact Matrices
title_sort	ordering protein contact matrices
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5889711/ https://www.ncbi.nlm.nih.gov/pubmed/29632657 http://dx.doi.org/10.1016/j.csbj.2018.03.001
work_keys_str_mv	AT xuchuan orderingproteincontactmatrices AT bouvierguillaume orderingproteincontactmatrices AT bardiauxbenjamin orderingproteincontactmatrices AT nilgesmichael orderingproteincontactmatrices AT malliavintherese orderingproteincontactmatrices AT lisserabdel orderingproteincontactmatrices

Ordering Protein Contact Matrices

Ejemplares similares