Cargando…

Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

BACKGROUND: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. OBJECT...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lamer, Antoine, Fruchart, Mathilde, Paris, Nicolas, Popoff, Benjamin, Payen, Anaïs, Balcaen, Thibaut, Gacquer, William, Bouzillé, Guillaume, Cuggia, Marc, Doutreligne, Matthieu, Chazard, Emmanuel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623460/ https://www.ncbi.nlm.nih.gov/pubmed/36251369 http://dx.doi.org/10.2196/38936

_version_	1784822002794627072
author	Lamer, Antoine Fruchart, Mathilde Paris, Nicolas Popoff, Benjamin Payen, Anaïs Balcaen, Thibaut Gacquer, William Bouzillé, Guillaume Cuggia, Marc Doutreligne, Matthieu Chazard, Emmanuel
author_facet	Lamer, Antoine Fruchart, Mathilde Paris, Nicolas Popoff, Benjamin Payen, Anaïs Balcaen, Thibaut Gacquer, William Bouzillé, Guillaume Cuggia, Marc Doutreligne, Matthieu Chazard, Emmanuel
author_sort	Lamer, Antoine
collection	PubMed
description	BACKGROUND: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. OBJECTIVE: The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. METHODS: This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). RESULTS: We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. “Track” is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). “Feature” is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables “TRACK” and “FEATURE” to store variables obtained in feature extraction and extend the OMOP CDM. CONCLUSIONS: We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.
format	Online Article Text
id	pubmed-9623460
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-96234602022-11-02 Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study Lamer, Antoine Fruchart, Mathilde Paris, Nicolas Popoff, Benjamin Payen, Anaïs Balcaen, Thibaut Gacquer, William Bouzillé, Guillaume Cuggia, Marc Doutreligne, Matthieu Chazard, Emmanuel JMIR Med Inform Original Paper BACKGROUND: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. OBJECTIVE: The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. METHODS: This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). RESULTS: We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. “Track” is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). “Feature” is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables “TRACK” and “FEATURE” to store variables obtained in feature extraction and extend the OMOP CDM. CONCLUSIONS: We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies. JMIR Publications 2022-10-17 /pmc/articles/PMC9623460/ /pubmed/36251369 http://dx.doi.org/10.2196/38936 Text en ©Antoine Lamer, Mathilde Fruchart, Nicolas Paris, Benjamin Popoff, Anaïs Payen, Thibaut Balcaen, William Gacquer, Guillaume Bouzillé, Marc Cuggia, Matthieu Doutreligne, Emmanuel Chazard. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 17.10.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Lamer, Antoine Fruchart, Mathilde Paris, Nicolas Popoff, Benjamin Payen, Anaïs Balcaen, Thibaut Gacquer, William Bouzillé, Guillaume Cuggia, Marc Doutreligne, Matthieu Chazard, Emmanuel Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_full	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_fullStr	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_full_unstemmed	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_short	Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study
title_sort	standardized description of the feature extraction process to transform raw data into meaningful information for enhancing data reuse: consensus study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623460/ https://www.ncbi.nlm.nih.gov/pubmed/36251369 http://dx.doi.org/10.2196/38936
work_keys_str_mv	AT lamerantoine standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT fruchartmathilde standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT parisnicolas standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT popoffbenjamin standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT payenanais standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT balcaenthibaut standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT gacquerwilliam standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT bouzilleguillaume standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT cuggiamarc standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT doutrelignematthieu standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy AT chazardemmanuel standardizeddescriptionofthefeatureextractionprocesstotransformrawdataintomeaningfulinformationforenhancingdatareuseconsensusstudy

Standardized Description of the Feature Extraction Process to Transform Raw Data Into Meaningful Information for Enhancing Data Reuse: Consensus Study

Ejemplares similares