Cargando…

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases wit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Naveja, José J., Vogt, Martin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433811/ https://www.ncbi.nlm.nih.gov/pubmed/34500724 http://dx.doi.org/10.3390/molecules26175291

_version_	1783751447635034112
author	Naveja, José J. Vogt, Martin
author_facet	Naveja, José J. Vogt, Martin
author_sort	Naveja, José J.
collection	PubMed
description	Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.
format	Online Article Text
id	pubmed-8433811
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-84338112021-09-12 Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications Naveja, José J. Vogt, Martin Molecules Review Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments. MDPI 2021-08-31 /pmc/articles/PMC8433811/ /pubmed/34500724 http://dx.doi.org/10.3390/molecules26175291 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Review Naveja, José J. Vogt, Martin Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title	Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title_full	Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title_fullStr	Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title_full_unstemmed	Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title_short	Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
title_sort	automatic identification of analogue series from large compound data sets: methods and applications
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8433811/ https://www.ncbi.nlm.nih.gov/pubmed/34500724 http://dx.doi.org/10.3390/molecules26175291
work_keys_str_mv	AT navejajosej automaticidentificationofanalogueseriesfromlargecompounddatasetsmethodsandapplications AT vogtmartin automaticidentificationofanalogueseriesfromlargecompounddatasetsmethodsandapplications

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications

Ejemplares similares