Cargando…

Automatic identification of relevant chemical compounds from patents

In commercial research and development projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent authorities make available the patents but do not provide syst...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhondi, Saber A, Rey, Hinnerk, Schwörer, Markus, Maier, Michael, Toomey, John, Nau, Heike, Ilchmann, Gabriele, Sheehan, Mark, Irmer, Matthias, Bobach, Claudia, Doornenbal, Marius, Gregory, Michelle, Kors, Jan A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351730/
https://www.ncbi.nlm.nih.gov/pubmed/30698776
http://dx.doi.org/10.1093/database/baz001
_version_ 1783390643868925952
author Akhondi, Saber A
Rey, Hinnerk
Schwörer, Markus
Maier, Michael
Toomey, John
Nau, Heike
Ilchmann, Gabriele
Sheehan, Mark
Irmer, Matthias
Bobach, Claudia
Doornenbal, Marius
Gregory, Michelle
Kors, Jan A
author_facet Akhondi, Saber A
Rey, Hinnerk
Schwörer, Markus
Maier, Michael
Toomey, John
Nau, Heike
Ilchmann, Gabriele
Sheehan, Mark
Irmer, Matthias
Bobach, Claudia
Doornenbal, Marius
Gregory, Michelle
Kors, Jan A
author_sort Akhondi, Saber A
collection PubMed
description In commercial research and development projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent authorities make available the patents but do not provide systematic continuous chemical annotations. Content databases such as Elsevier’s Reaxys provide such services mostly based on manual excerptions, which are time-consuming and costly. Automatic text-mining approaches help overcome some of the limitations of the manual process. Different text-mining approaches exist to extract chemical entities from patents. The majority of them have been developed using sub-sections of patent documents and focus on mentions of compounds. Less attention has been given to relevancy of a compound in a patent. Relevancy of a compound to a patent is based on the patent’s context. A relevant compound plays a major role within a patent. Identification of relevant compounds reduces the size of the extracted data and improves the usefulness of patent resources (e.g. supports identifying the main compounds). Annotators of databases like Reaxys only annotate relevant compounds. In this study, we design an automated system that extracts chemical entities from patents and classifies their relevance. The gold-standard set contained 18 789 chemical entity annotations. Of these, 10% were relevant compounds, 88% were irrelevant and 2% were equivocal. Our compound recognition system was based on proprietary tools. The performance (F-score) of the system on compound recognition was 84% on the development set and 86% on the test set. The relevancy classification system had an F-score of 86% on the development set and 82% on the test set. Our system can extract chemical compounds from patents and classify their relevance with high performance. This enables the extension of the Reaxys database by means of automation.
format Online
Article
Text
id pubmed-6351730
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63517302019-02-08 Automatic identification of relevant chemical compounds from patents Akhondi, Saber A Rey, Hinnerk Schwörer, Markus Maier, Michael Toomey, John Nau, Heike Ilchmann, Gabriele Sheehan, Mark Irmer, Matthias Bobach, Claudia Doornenbal, Marius Gregory, Michelle Kors, Jan A Database (Oxford) Original Article In commercial research and development projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent authorities make available the patents but do not provide systematic continuous chemical annotations. Content databases such as Elsevier’s Reaxys provide such services mostly based on manual excerptions, which are time-consuming and costly. Automatic text-mining approaches help overcome some of the limitations of the manual process. Different text-mining approaches exist to extract chemical entities from patents. The majority of them have been developed using sub-sections of patent documents and focus on mentions of compounds. Less attention has been given to relevancy of a compound in a patent. Relevancy of a compound to a patent is based on the patent’s context. A relevant compound plays a major role within a patent. Identification of relevant compounds reduces the size of the extracted data and improves the usefulness of patent resources (e.g. supports identifying the main compounds). Annotators of databases like Reaxys only annotate relevant compounds. In this study, we design an automated system that extracts chemical entities from patents and classifies their relevance. The gold-standard set contained 18 789 chemical entity annotations. Of these, 10% were relevant compounds, 88% were irrelevant and 2% were equivocal. Our compound recognition system was based on proprietary tools. The performance (F-score) of the system on compound recognition was 84% on the development set and 86% on the test set. The relevancy classification system had an F-score of 86% on the development set and 82% on the test set. Our system can extract chemical compounds from patents and classify their relevance with high performance. This enables the extension of the Reaxys database by means of automation. Oxford University Press 2019-01-30 /pmc/articles/PMC6351730/ /pubmed/30698776 http://dx.doi.org/10.1093/database/baz001 Text en © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permission@oup.com. https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
spellingShingle Original Article
Akhondi, Saber A
Rey, Hinnerk
Schwörer, Markus
Maier, Michael
Toomey, John
Nau, Heike
Ilchmann, Gabriele
Sheehan, Mark
Irmer, Matthias
Bobach, Claudia
Doornenbal, Marius
Gregory, Michelle
Kors, Jan A
Automatic identification of relevant chemical compounds from patents
title Automatic identification of relevant chemical compounds from patents
title_full Automatic identification of relevant chemical compounds from patents
title_fullStr Automatic identification of relevant chemical compounds from patents
title_full_unstemmed Automatic identification of relevant chemical compounds from patents
title_short Automatic identification of relevant chemical compounds from patents
title_sort automatic identification of relevant chemical compounds from patents
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6351730/
https://www.ncbi.nlm.nih.gov/pubmed/30698776
http://dx.doi.org/10.1093/database/baz001
work_keys_str_mv AT akhondisabera automaticidentificationofrelevantchemicalcompoundsfrompatents
AT reyhinnerk automaticidentificationofrelevantchemicalcompoundsfrompatents
AT schworermarkus automaticidentificationofrelevantchemicalcompoundsfrompatents
AT maiermichael automaticidentificationofrelevantchemicalcompoundsfrompatents
AT toomeyjohn automaticidentificationofrelevantchemicalcompoundsfrompatents
AT nauheike automaticidentificationofrelevantchemicalcompoundsfrompatents
AT ilchmanngabriele automaticidentificationofrelevantchemicalcompoundsfrompatents
AT sheehanmark automaticidentificationofrelevantchemicalcompoundsfrompatents
AT irmermatthias automaticidentificationofrelevantchemicalcompoundsfrompatents
AT bobachclaudia automaticidentificationofrelevantchemicalcompoundsfrompatents
AT doornenbalmarius automaticidentificationofrelevantchemicalcompoundsfrompatents
AT gregorymichelle automaticidentificationofrelevantchemicalcompoundsfrompatents
AT korsjana automaticidentificationofrelevantchemicalcompoundsfrompatents