Cargando…

Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection

Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization...

Descripción completa

Detalles Bibliográficos
Autores principales: Alam, Fardina Fathmiul, Rahman, Taseef, Shehu, Amarda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179114/
https://www.ncbi.nlm.nih.gov/pubmed/32143444
http://dx.doi.org/10.3390/molecules25051146
Descripción
Sumario:Rapid growth in molecular structure data is renewing interest in featurizing structure. Featurizations that retain information on biological activity are particularly sought for protein molecules, where decades of research have shown that indeed structure encodes function. Research on featurization of protein structure is active, but here we assess the promise of autoencoders. Motivated by rapid progress in neural network research, we investigate and evaluate autoencoders on yielding linear and nonlinear featurizations of protein tertiary structures. An additional reason we focus on autoencoders as the engine to obtain featurizations is the versatility of their architectures and the ease with which changes to architecture yield linear versus nonlinear features. While open-source neural network libraries, such as Keras, which we employ here, greatly facilitate constructing, training, and evaluating autoencoder architectures and conducting model search, autoencoders have not yet gained popularity in the structure biology community. Here we demonstrate their utility in a practical context. Employing autoencoder-based featurizations, we address the classic problem of decoy selection in protein structure prediction. Utilizing off-the-shelf supervised learning methods, we demonstrate that the featurizations are indeed meaningful and allow detecting active tertiary structures, thus opening the way for further avenues of research.