Cargando…
What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities
BACKGROUND: Because of the growing involvement of communities from various disciplines, data science is constantly evolving and gaining popularity. The growing interest in data science-based services and applications presents numerous challenges for their development. Therefore, data scientists freq...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280584/ https://www.ncbi.nlm.nih.gov/pubmed/37346688 http://dx.doi.org/10.7717/peerj-cs.1361 |
_version_ | 1785060828306735104 |
---|---|
author | Gurcan, Fatih |
author_facet | Gurcan, Fatih |
author_sort | Gurcan, Fatih |
collection | PubMed |
description | BACKGROUND: Because of the growing involvement of communities from various disciplines, data science is constantly evolving and gaining popularity. The growing interest in data science-based services and applications presents numerous challenges for their development. Therefore, data scientists frequently turn to various forums, particularly domain-specific Q&A websites, to solve difficulties. These websites evolve into data science knowledge repositories over time. Analysis of such repositories can provide valuable insights into the applications, topics, trends, and challenges of data science. METHODS: In this article, we investigated what data scientists are asking by analyzing all posts to date on DSSE, a data science-focused Q&A website. To discover main topics embedded in data science discussions, we used latent Dirichlet allocation (LDA), a probabilistic approach for topic modeling. RESULTS: As a result of this analysis, 18 main topics were identified that demonstrate the current interests and issues in data science. We then examined the topics’ popularity and difficulty. In addition, we identified the most commonly used tasks, techniques, and tools in data science. As a result, “Model Training”, “Machine Learning”, and “Neural Networks” emerged as the most prominent topics. Also, “Data Manipulation”, “Coding Errors”, and “Tools” were identified as the most viewed (most popular) topics. On the other hand, the most difficult topics were identified as “Time Series”, “Computer Vision”, and “Recommendation Systems”. Our findings have significant implications for many data science stakeholders who are striving to advance data-driven architectures, concepts, tools, and techniques. |
format | Online Article Text |
id | pubmed-10280584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102805842023-06-21 What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities Gurcan, Fatih PeerJ Comput Sci Data Mining and Machine Learning BACKGROUND: Because of the growing involvement of communities from various disciplines, data science is constantly evolving and gaining popularity. The growing interest in data science-based services and applications presents numerous challenges for their development. Therefore, data scientists frequently turn to various forums, particularly domain-specific Q&A websites, to solve difficulties. These websites evolve into data science knowledge repositories over time. Analysis of such repositories can provide valuable insights into the applications, topics, trends, and challenges of data science. METHODS: In this article, we investigated what data scientists are asking by analyzing all posts to date on DSSE, a data science-focused Q&A website. To discover main topics embedded in data science discussions, we used latent Dirichlet allocation (LDA), a probabilistic approach for topic modeling. RESULTS: As a result of this analysis, 18 main topics were identified that demonstrate the current interests and issues in data science. We then examined the topics’ popularity and difficulty. In addition, we identified the most commonly used tasks, techniques, and tools in data science. As a result, “Model Training”, “Machine Learning”, and “Neural Networks” emerged as the most prominent topics. Also, “Data Manipulation”, “Coding Errors”, and “Tools” were identified as the most viewed (most popular) topics. On the other hand, the most difficult topics were identified as “Time Series”, “Computer Vision”, and “Recommendation Systems”. Our findings have significant implications for many data science stakeholders who are striving to advance data-driven architectures, concepts, tools, and techniques. PeerJ Inc. 2023-05-18 /pmc/articles/PMC10280584/ /pubmed/37346688 http://dx.doi.org/10.7717/peerj-cs.1361 Text en © 2023 Gurcan https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Mining and Machine Learning Gurcan, Fatih What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title | What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title_full | What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title_fullStr | What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title_full_unstemmed | What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title_short | What issues are data scientists talking about? Identification of current data science issues using semantic content analysis of Q&A communities |
title_sort | what issues are data scientists talking about? identification of current data science issues using semantic content analysis of q&a communities |
topic | Data Mining and Machine Learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280584/ https://www.ncbi.nlm.nih.gov/pubmed/37346688 http://dx.doi.org/10.7717/peerj-cs.1361 |
work_keys_str_mv | AT gurcanfatih whatissuesaredatascientiststalkingaboutidentificationofcurrentdatascienceissuesusingsemanticcontentanalysisofqacommunities |