Cargando…
Mapping and classifying molecules from a high-throughput structural database
High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5289135/ https://www.ncbi.nlm.nih.gov/pubmed/28203290 http://dx.doi.org/10.1186/s13321-017-0192-4 |
_version_ | 1782504460223774720 |
---|---|
author | De, Sandip Musil, Felix Ingram, Teresa Baldauf, Carsten Ceriotti, Michele |
author_facet | De, Sandip Musil, Felix Ingram, Teresa Baldauf, Carsten Ceriotti, Michele |
author_sort | De, Sandip |
collection | PubMed |
description | High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure–property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques—showing how these can help reveal structure–property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0192-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5289135 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-52891352017-02-15 Mapping and classifying molecules from a high-throughput structural database De, Sandip Musil, Felix Ingram, Teresa Baldauf, Carsten Ceriotti, Michele J Cheminform Research Article High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure–property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques—showing how these can help reveal structure–property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0192-4) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-02-02 /pmc/articles/PMC5289135/ /pubmed/28203290 http://dx.doi.org/10.1186/s13321-017-0192-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article De, Sandip Musil, Felix Ingram, Teresa Baldauf, Carsten Ceriotti, Michele Mapping and classifying molecules from a high-throughput structural database |
title | Mapping and classifying molecules from a high-throughput structural database |
title_full | Mapping and classifying molecules from a high-throughput structural database |
title_fullStr | Mapping and classifying molecules from a high-throughput structural database |
title_full_unstemmed | Mapping and classifying molecules from a high-throughput structural database |
title_short | Mapping and classifying molecules from a high-throughput structural database |
title_sort | mapping and classifying molecules from a high-throughput structural database |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5289135/ https://www.ncbi.nlm.nih.gov/pubmed/28203290 http://dx.doi.org/10.1186/s13321-017-0192-4 |
work_keys_str_mv | AT desandip mappingandclassifyingmoleculesfromahighthroughputstructuraldatabase AT musilfelix mappingandclassifyingmoleculesfromahighthroughputstructuraldatabase AT ingramteresa mappingandclassifyingmoleculesfromahighthroughputstructuraldatabase AT baldaufcarsten mappingandclassifyingmoleculesfromahighthroughputstructuraldatabase AT ceriottimichele mappingandclassifyingmoleculesfromahighthroughputstructuraldatabase |