Cargando…

ProtNN: fast and accurate protein 3D-structure classification in structural and topological space

BACKGROUND: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, an...

Descripción completa

Detalles Bibliográficos
Autores principales: Dhifli, Wajdi, Diallo, Abdoulaye Baniré
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034655/
https://www.ncbi.nlm.nih.gov/pubmed/27688811
http://dx.doi.org/10.1186/s13040-016-0108-2
Descripción
Sumario:BACKGROUND: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the classification of protein structures. RESULTS: We propose ProtNN, a novel classification approach for protein 3D-structures. Given an unannotated query protein structure and a set of annotated proteins, ProtNN assigns to the query protein the class with the highest number of votes across the k nearest neighbor reference proteins, where k is a user-defined parameter. The search of the nearest neighbor annotated structures is based on a protein-graph representation model and pairwise similarities between vector embedding of the query and the reference protein structures in structural and topological spaces. CONCLUSIONS: We demonstrate through an extensive experimental evaluation that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude in runtime compared to state-of-the-art approaches.