Cargando…

DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining

[Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, inclu...

Descripción completa

Detalles Bibliográficos
Autores principales: Glasby, Lawson T., Gubsch, Kristian, Bence, Rosalee, Oktavian, Rama, Isoko, Kesler, Moosavi, Seyed Mohamad, Cordiner, Joan L., Cole, Jason C., Moghadam, Peyman Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269341/
https://www.ncbi.nlm.nih.gov/pubmed/37332681
http://dx.doi.org/10.1021/acs.chemmater.3c00788
_version_ 1785059157828698112
author Glasby, Lawson T.
Gubsch, Kristian
Bence, Rosalee
Oktavian, Rama
Isoko, Kesler
Moosavi, Seyed Mohamad
Cordiner, Joan L.
Cole, Jason C.
Moghadam, Peyman Z.
author_facet Glasby, Lawson T.
Gubsch, Kristian
Bence, Rosalee
Oktavian, Rama
Isoko, Kesler
Moosavi, Seyed Mohamad
Cordiner, Joan L.
Cole, Jason C.
Moghadam, Peyman Z.
author_sort Glasby, Lawson T.
collection PubMed
description [Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties.
format Online
Article
Text
id pubmed-10269341
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-102693412023-06-16 DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining Glasby, Lawson T. Gubsch, Kristian Bence, Rosalee Oktavian, Rama Isoko, Kesler Moosavi, Seyed Mohamad Cordiner, Joan L. Cole, Jason C. Moghadam, Peyman Z. Chem Mater [Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties. American Chemical Society 2023-05-18 /pmc/articles/PMC10269341/ /pubmed/37332681 http://dx.doi.org/10.1021/acs.chemmater.3c00788 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Glasby, Lawson T.
Gubsch, Kristian
Bence, Rosalee
Oktavian, Rama
Isoko, Kesler
Moosavi, Seyed Mohamad
Cordiner, Joan L.
Cole, Jason C.
Moghadam, Peyman Z.
DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title_full DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title_fullStr DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title_full_unstemmed DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title_short DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
title_sort digimof: a database of metal–organic framework synthesis information generated via text mining
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269341/
https://www.ncbi.nlm.nih.gov/pubmed/37332681
http://dx.doi.org/10.1021/acs.chemmater.3c00788
work_keys_str_mv AT glasbylawsont digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT gubschkristian digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT bencerosalee digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT oktavianrama digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT isokokesler digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT moosaviseyedmohamad digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT cordinerjoanl digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT colejasonc digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining
AT moghadampeymanz digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining