Cargando…
DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining
[Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, inclu...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269341/ https://www.ncbi.nlm.nih.gov/pubmed/37332681 http://dx.doi.org/10.1021/acs.chemmater.3c00788 |
_version_ | 1785059157828698112 |
---|---|
author | Glasby, Lawson T. Gubsch, Kristian Bence, Rosalee Oktavian, Rama Isoko, Kesler Moosavi, Seyed Mohamad Cordiner, Joan L. Cole, Jason C. Moghadam, Peyman Z. |
author_facet | Glasby, Lawson T. Gubsch, Kristian Bence, Rosalee Oktavian, Rama Isoko, Kesler Moosavi, Seyed Mohamad Cordiner, Joan L. Cole, Jason C. Moghadam, Peyman Z. |
author_sort | Glasby, Lawson T. |
collection | PubMed |
description | [Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties. |
format | Online Article Text |
id | pubmed-10269341 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-102693412023-06-16 DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining Glasby, Lawson T. Gubsch, Kristian Bence, Rosalee Oktavian, Rama Isoko, Kesler Moosavi, Seyed Mohamad Cordiner, Joan L. Cole, Jason C. Moghadam, Peyman Z. Chem Mater [Image: see text] The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties. American Chemical Society 2023-05-18 /pmc/articles/PMC10269341/ /pubmed/37332681 http://dx.doi.org/10.1021/acs.chemmater.3c00788 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Glasby, Lawson T. Gubsch, Kristian Bence, Rosalee Oktavian, Rama Isoko, Kesler Moosavi, Seyed Mohamad Cordiner, Joan L. Cole, Jason C. Moghadam, Peyman Z. DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining |
title | DigiMOF: A Database of Metal–Organic Framework
Synthesis Information Generated via Text Mining |
title_full | DigiMOF: A Database of Metal–Organic Framework
Synthesis Information Generated via Text Mining |
title_fullStr | DigiMOF: A Database of Metal–Organic Framework
Synthesis Information Generated via Text Mining |
title_full_unstemmed | DigiMOF: A Database of Metal–Organic Framework
Synthesis Information Generated via Text Mining |
title_short | DigiMOF: A Database of Metal–Organic Framework
Synthesis Information Generated via Text Mining |
title_sort | digimof: a database of metal–organic framework
synthesis information generated via text mining |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10269341/ https://www.ncbi.nlm.nih.gov/pubmed/37332681 http://dx.doi.org/10.1021/acs.chemmater.3c00788 |
work_keys_str_mv | AT glasbylawsont digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT gubschkristian digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT bencerosalee digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT oktavianrama digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT isokokesler digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT moosaviseyedmohamad digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT cordinerjoanl digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT colejasonc digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining AT moghadampeymanz digimofadatabaseofmetalorganicframeworksynthesisinformationgeneratedviatextmining |