Cargando…

Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI

BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchang...

Descripción completa

Detalles Bibliográficos
Autor principal: O’Boyle, Noel M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495655/
https://www.ncbi.nlm.nih.gov/pubmed/22989151
http://dx.doi.org/10.1186/1758-2946-4-22
_version_ 1782249542678216704
author O’Boyle, Noel M
author_facet O’Boyle, Noel M
author_sort O’Boyle, Noel M
collection PubMed
description BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain – such as the development of a standard aromatic model for SMILES – the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
format Online
Article
Text
id pubmed-3495655
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34956552012-11-13 Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI O’Boyle, Noel M J Cheminform Research Article BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain – such as the development of a standard aromatic model for SMILES – the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits. BioMed Central 2012-09-18 /pmc/articles/PMC3495655/ /pubmed/22989151 http://dx.doi.org/10.1186/1758-2946-4-22 Text en Copyright ©2012 O'Boyle; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
O’Boyle, Noel M
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title_full Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title_fullStr Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title_full_unstemmed Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title_short Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
title_sort towards a universal smiles representation - a standard method to generate canonical smiles based on the inchi
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495655/
https://www.ncbi.nlm.nih.gov/pubmed/22989151
http://dx.doi.org/10.1186/1758-2946-4-22
work_keys_str_mv AT oboylenoelm towardsauniversalsmilesrepresentationastandardmethodtogeneratecanonicalsmilesbasedontheinchi