Cargando…
ProtNN: fast and accurate protein 3D-structure classification in structural and topological space
BACKGROUND: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, an...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034655/ https://www.ncbi.nlm.nih.gov/pubmed/27688811 http://dx.doi.org/10.1186/s13040-016-0108-2 |
_version_ | 1782455316010500096 |
---|---|
author | Dhifli, Wajdi Diallo, Abdoulaye Baniré |
author_facet | Dhifli, Wajdi Diallo, Abdoulaye Baniré |
author_sort | Dhifli, Wajdi |
collection | PubMed |
description | BACKGROUND: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the classification of protein structures. RESULTS: We propose ProtNN, a novel classification approach for protein 3D-structures. Given an unannotated query protein structure and a set of annotated proteins, ProtNN assigns to the query protein the class with the highest number of votes across the k nearest neighbor reference proteins, where k is a user-defined parameter. The search of the nearest neighbor annotated structures is based on a protein-graph representation model and pairwise similarities between vector embedding of the query and the reference protein structures in structural and topological spaces. CONCLUSIONS: We demonstrate through an extensive experimental evaluation that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude in runtime compared to state-of-the-art approaches. |
format | Online Article Text |
id | pubmed-5034655 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50346552016-09-29 ProtNN: fast and accurate protein 3D-structure classification in structural and topological space Dhifli, Wajdi Diallo, Abdoulaye Baniré BioData Min Research BACKGROUND: Studying the functions and structures of proteins is important for understanding the molecular mechanisms of life. The number of publicly available protein structures has increasingly become extremely large. Still, the classification of a protein structure remains a difficult, costly, and time consuming task. The difficulties are often due to the essential role of spatial and topological structures in the classification of protein structures. RESULTS: We propose ProtNN, a novel classification approach for protein 3D-structures. Given an unannotated query protein structure and a set of annotated proteins, ProtNN assigns to the query protein the class with the highest number of votes across the k nearest neighbor reference proteins, where k is a user-defined parameter. The search of the nearest neighbor annotated structures is based on a protein-graph representation model and pairwise similarities between vector embedding of the query and the reference protein structures in structural and topological spaces. CONCLUSIONS: We demonstrate through an extensive experimental evaluation that ProtNN is able to accurately classify several datasets in an extremely fast runtime compared to state-of-the-art approaches. We further show that ProtNN is able to scale up to a whole PDB dataset in a single-process mode with no parallelization, with a gain of thousands order of magnitude in runtime compared to state-of-the-art approaches. BioMed Central 2016-09-23 /pmc/articles/PMC5034655/ /pubmed/27688811 http://dx.doi.org/10.1186/s13040-016-0108-2 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Dhifli, Wajdi Diallo, Abdoulaye Baniré ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title | ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title_full | ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title_fullStr | ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title_full_unstemmed | ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title_short | ProtNN: fast and accurate protein 3D-structure classification in structural and topological space |
title_sort | protnn: fast and accurate protein 3d-structure classification in structural and topological space |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034655/ https://www.ncbi.nlm.nih.gov/pubmed/27688811 http://dx.doi.org/10.1186/s13040-016-0108-2 |
work_keys_str_mv | AT dhifliwajdi protnnfastandaccurateprotein3dstructureclassificationinstructuralandtopologicalspace AT dialloabdoulayebanire protnnfastandaccurateprotein3dstructureclassificationinstructuralandtopologicalspace |