Cargando…

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. Whi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shinada, Nicolas K., Schmidtke, Peter, de Brevern, Alexandre G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139665/ https://www.ncbi.nlm.nih.gov/pubmed/32213914 http://dx.doi.org/10.3390/ijms21062243

_version_	1783518818408071168
author	Shinada, Nicolas K. Schmidtke, Peter de Brevern, Alexandre G.
author_facet	Shinada, Nicolas K. Schmidtke, Peter de Brevern, Alexandre G.
author_sort	Shinada, Nicolas K.
collection	PubMed
description	The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.
format	Online Article Text
id	pubmed-7139665
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-71396652020-04-10 Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB) Shinada, Nicolas K. Schmidtke, Peter de Brevern, Alexandre G. Int J Mol Sci Article The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein–protein, protein–DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies. MDPI 2020-03-24 /pmc/articles/PMC7139665/ /pubmed/32213914 http://dx.doi.org/10.3390/ijms21062243 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Shinada, Nicolas K. Schmidtke, Peter de Brevern, Alexandre G. Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title	Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title_full	Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title_fullStr	Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title_full_unstemmed	Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title_short	Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)
title_sort	accurate representation of protein-ligand structural diversity in the protein data bank (pdb)
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139665/ https://www.ncbi.nlm.nih.gov/pubmed/32213914 http://dx.doi.org/10.3390/ijms21062243
work_keys_str_mv	AT shinadanicolask accuraterepresentationofproteinligandstructuraldiversityintheproteindatabankpdb AT schmidtkepeter accuraterepresentationofproteinligandstructuraldiversityintheproteindatabankpdb AT debrevernalexandreg accuraterepresentationofproteinligandstructuraldiversityintheproteindatabankpdb

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

Ejemplares similares