Cargando…
Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling
This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with nov...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914545/ https://www.ncbi.nlm.nih.gov/pubmed/35284247 http://dx.doi.org/10.1016/j.mex.2022.101650 |
_version_ | 1784667735625564160 |
---|---|
author | Hajikhani, Arash Pukelis, Lukas Suominen, Arho Ashouri, Sajad Schubert, Torben Notten, Ad Cunningham, Scott W. |
author_facet | Hajikhani, Arash Pukelis, Lukas Suominen, Arho Ashouri, Sajad Schubert, Torben Notten, Ad Cunningham, Scott W. |
author_sort | Hajikhani, Arash |
collection | PubMed |
description | This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies' website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources. |
format | Online Article Text |
id | pubmed-8914545 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-89145452022-03-12 Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling Hajikhani, Arash Pukelis, Lukas Suominen, Arho Ashouri, Sajad Schubert, Torben Notten, Ad Cunningham, Scott W. MethodsX Method Article This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies' website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources. Elsevier 2022-02-27 /pmc/articles/PMC8914545/ /pubmed/35284247 http://dx.doi.org/10.1016/j.mex.2022.101650 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Article Hajikhani, Arash Pukelis, Lukas Suominen, Arho Ashouri, Sajad Schubert, Torben Notten, Ad Cunningham, Scott W. Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title | Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title_full | Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title_fullStr | Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title_full_unstemmed | Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title_short | Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling |
title_sort | connecting firm's web scraped textual content to body of science: utilizing microsoft academic graph hierarchical topic modeling |
topic | Method Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914545/ https://www.ncbi.nlm.nih.gov/pubmed/35284247 http://dx.doi.org/10.1016/j.mex.2022.101650 |
work_keys_str_mv | AT hajikhaniarash connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT pukelislukas connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT suominenarho connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT ashourisajad connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT schuberttorben connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT nottenad connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling AT cunninghamscottw connectingfirmswebscrapedtextualcontenttobodyofscienceutilizingmicrosoftacademicgraphhierarchicaltopicmodeling |