Cargando…

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of wh...

Descripción completa

Detalles Bibliográficos
Autores principales:	Andrews, David M., Broad, Laura M., Edwards, Paul J., Fox, David N. A., Gallagher, Timothy, Garland, Stephen L., Kidd, Richard, Sweeney, Joseph B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Royal Society of Chemistry 2016
Materias:	Chemistry
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6013800/ https://www.ncbi.nlm.nih.gov/pubmed/30155031 http://dx.doi.org/10.1039/c6sc00264a

_version_	1783334096368304128
author	Andrews, David M. Broad, Laura M. Edwards, Paul J. Fox, David N. A. Gallagher, Timothy Garland, Stephen L. Kidd, Richard Sweeney, Joseph B.
author_facet	Andrews, David M. Broad, Laura M. Edwards, Paul J. Fox, David N. A. Gallagher, Timothy Garland, Stephen L. Kidd, Richard Sweeney, Joseph B.
author_sort	Andrews, David M.
collection	PubMed
description	We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses.
format	Online Article Text
id	pubmed-6013800
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Royal Society of Chemistry
record_format	MEDLINE/PubMed
spelling	pubmed-60138002018-08-28 The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot Andrews, David M. Broad, Laura M. Edwards, Paul J. Fox, David N. A. Gallagher, Timothy Garland, Stephen L. Kidd, Richard Sweeney, Joseph B. Chem Sci Chemistry We present a summary of the National Compound Collection (NCC) pilot; which harvested chemical structure data from 746 publicly-available PhD theses to create an enhanced database of diverse and interesting (largely organic) molecular entities. The database comprised ∼75 000 structure entries, of which 70% were new to ChemSpider at the time of upload. The dataset was evaluated for structural uniqueness by twelve external drug discovery groups from the pharmaceutical, biotech, academic and not-for-profit sectors. These partners generated data reported here comparing the NCC pilot with their in-house compound collections. The proportion of NCC structures considered to be useful for drug discovery ranged from 5–80% depending on the strictness of the filters used; most interestingly from a drug discovery standpoint ∼13k NCC compounds (18% of the NCC) passed the filters and were of good diversity. These compounds are quite different from those that are already present in the screening collections but not so different that they are no longer considered to be drug-like. In general, the drug discovery teams would consider these compounds to be high value molecules for inclusion in their screening collections. This pilot addressed the potential value of unpublished data and explored the practicalities of large-scale data extraction, to inform both retrospective and prospective extraction of chemical data from theses. Royal Society of Chemistry 2016-06-01 2016-02-23 /pmc/articles/PMC6013800/ /pubmed/30155031 http://dx.doi.org/10.1039/c6sc00264a Text en This journal is © The Royal Society of Chemistry 2016 http://creativecommons.org/licenses/by/3.0/ This article is freely available. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence (CC BY 3.0)
spellingShingle	Chemistry Andrews, David M. Broad, Laura M. Edwards, Paul J. Fox, David N. A. Gallagher, Timothy Garland, Stephen L. Kidd, Richard Sweeney, Joseph B. The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title	The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title_full	The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title_fullStr	The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title_full_unstemmed	The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title_short	The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot
title_sort	creation and characterisation of a national compound collection: the royal society of chemistry pilot
topic	Chemistry
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6013800/ https://www.ncbi.nlm.nih.gov/pubmed/30155031 http://dx.doi.org/10.1039/c6sc00264a
work_keys_str_mv	AT andrewsdavidm thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT broadlauram thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT edwardspaulj thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT foxdavidna thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT gallaghertimothy thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT garlandstephenl thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT kiddrichard thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT sweeneyjosephb thecreationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT andrewsdavidm creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT broadlauram creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT edwardspaulj creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT foxdavidna creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT gallaghertimothy creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT garlandstephenl creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT kiddrichard creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot AT sweeneyjosephb creationandcharacterisationofanationalcompoundcollectiontheroyalsocietyofchemistrypilot

The creation and characterisation of a National Compound Collection: the Royal Society of Chemistry pilot

Ejemplares similares