Cargando…

The Molecule Cloud - compact visualization of large collections of molecules

BACKGROUND: Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold...

Descripción completa

Detalles Bibliográficos
Autores principales: Ertl, Peter, Rohde, Bernhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403880/
https://www.ncbi.nlm.nih.gov/pubmed/22769057
http://dx.doi.org/10.1186/1758-2946-4-12
_version_ 1782238936238653440
author Ertl, Peter
Rohde, Bernhard
author_facet Ertl, Peter
Rohde, Bernhard
author_sort Ertl, Peter
collection PubMed
description BACKGROUND: Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance. SUMMARY: A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach “Molecule Cloud”. The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code. CONCLUSIONS: Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.
format Online
Article
Text
id pubmed-3403880
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34038802012-07-25 The Molecule Cloud - compact visualization of large collections of molecules Ertl, Peter Rohde, Bernhard J Cheminform Methodology BACKGROUND: Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance. SUMMARY: A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach “Molecule Cloud”. The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code. CONCLUSIONS: Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided. BioMed Central 2012-07-06 /pmc/articles/PMC3403880/ /pubmed/22769057 http://dx.doi.org/10.1186/1758-2946-4-12 Text en Copyright ©2012 Ertl and Rohde; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Ertl, Peter
Rohde, Bernhard
The Molecule Cloud - compact visualization of large collections of molecules
title The Molecule Cloud - compact visualization of large collections of molecules
title_full The Molecule Cloud - compact visualization of large collections of molecules
title_fullStr The Molecule Cloud - compact visualization of large collections of molecules
title_full_unstemmed The Molecule Cloud - compact visualization of large collections of molecules
title_short The Molecule Cloud - compact visualization of large collections of molecules
title_sort molecule cloud - compact visualization of large collections of molecules
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403880/
https://www.ncbi.nlm.nih.gov/pubmed/22769057
http://dx.doi.org/10.1186/1758-2946-4-12
work_keys_str_mv AT ertlpeter themoleculecloudcompactvisualizationoflargecollectionsofmolecules
AT rohdebernhard themoleculecloudcompactvisualizationoflargecollectionsofmolecules
AT ertlpeter moleculecloudcompactvisualizationoflargecollectionsofmolecules
AT rohdebernhard moleculecloudcompactvisualizationoflargecollectionsofmolecules