Cargando…

An infrastructure for precision medicine through analysis of big data

BACKGROUND: Nowadays, the increasing availability of omics data, due to both the advancements in the acquisition of molecular biology results and in systems biology simulation technologies, provides the bases for precision medicine. Success in precision medicine depends on the access to healthcare a...

Descripción completa

Detalles Bibliográficos
Autores principales: Moscatelli, Marco, Manconi, Andrea, Pessina, Mauro, Fellegara, Giovanni, Rampoldi, Stefano, Milanesi, Luciano, Casasco, Andrea, Gnocchi, Matteo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191972/
https://www.ncbi.nlm.nih.gov/pubmed/30367571
http://dx.doi.org/10.1186/s12859-018-2300-5
_version_ 1783363819817402368
author Moscatelli, Marco
Manconi, Andrea
Pessina, Mauro
Fellegara, Giovanni
Rampoldi, Stefano
Milanesi, Luciano
Casasco, Andrea
Gnocchi, Matteo
author_facet Moscatelli, Marco
Manconi, Andrea
Pessina, Mauro
Fellegara, Giovanni
Rampoldi, Stefano
Milanesi, Luciano
Casasco, Andrea
Gnocchi, Matteo
author_sort Moscatelli, Marco
collection PubMed
description BACKGROUND: Nowadays, the increasing availability of omics data, due to both the advancements in the acquisition of molecular biology results and in systems biology simulation technologies, provides the bases for precision medicine. Success in precision medicine depends on the access to healthcare and biomedical data. To this end, the digitization of all clinical exams and medical records is becoming a standard in hospitals. The digitization is essential to collect, share, and aggregate large volumes of heterogeneous data to support the discovery of hidden patterns with the aim to define predictive models for biomedical purposes. Patients’ data sharing is a critical process. In fact, it raises ethical, social, legal, and technological issues that must be properly addressed. RESULTS: In this work, we present an infrastructure devised to deal with the integration of large volumes of heterogeneous biological data. The infrastructure was applied to the data collected between 2010–2016 in one of the major diagnostic analysis laboratories in Italy. Data from three different platforms were collected (i.e., laboratory exams, pathological anatomy exams, biopsy exams). The infrastructure has been designed to allow the extraction and aggregation of both unstructured and semi-structured data. Data are properly treated to ensure data security and privacy. Specialized algorithms have also been implemented to process the aggregated information with the aim to obtain a precise historical analysis of the clinical activities of one or more patients. Moreover, three Bayesian classifiers have been developed to analyze examinations reported as free text. Experimental results show that the classifiers exhibit a good accuracy when used to analyze sentences related to the sample location, diseases presence and status of the illnesses. CONCLUSIONS: The infrastructure allows the integration of multiple and heterogeneous sources of anonymized data from the different clinical platforms. Both unstructured and semi-structured data are processed to obtain a precise historical analysis of the clinical activities of one or more patients. Data aggregation allows to perform a series of statistical assessments required to answer complex questions that can be used in a variety of fields, such as predictive and precision medicine. In particular, studying the clinical history of patients that have developed similar pathologies can help to predict or individuate markers able to allow an early diagnosis of possible illnesses.
format Online
Article
Text
id pubmed-6191972
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61919722018-10-23 An infrastructure for precision medicine through analysis of big data Moscatelli, Marco Manconi, Andrea Pessina, Mauro Fellegara, Giovanni Rampoldi, Stefano Milanesi, Luciano Casasco, Andrea Gnocchi, Matteo BMC Bioinformatics Research BACKGROUND: Nowadays, the increasing availability of omics data, due to both the advancements in the acquisition of molecular biology results and in systems biology simulation technologies, provides the bases for precision medicine. Success in precision medicine depends on the access to healthcare and biomedical data. To this end, the digitization of all clinical exams and medical records is becoming a standard in hospitals. The digitization is essential to collect, share, and aggregate large volumes of heterogeneous data to support the discovery of hidden patterns with the aim to define predictive models for biomedical purposes. Patients’ data sharing is a critical process. In fact, it raises ethical, social, legal, and technological issues that must be properly addressed. RESULTS: In this work, we present an infrastructure devised to deal with the integration of large volumes of heterogeneous biological data. The infrastructure was applied to the data collected between 2010–2016 in one of the major diagnostic analysis laboratories in Italy. Data from three different platforms were collected (i.e., laboratory exams, pathological anatomy exams, biopsy exams). The infrastructure has been designed to allow the extraction and aggregation of both unstructured and semi-structured data. Data are properly treated to ensure data security and privacy. Specialized algorithms have also been implemented to process the aggregated information with the aim to obtain a precise historical analysis of the clinical activities of one or more patients. Moreover, three Bayesian classifiers have been developed to analyze examinations reported as free text. Experimental results show that the classifiers exhibit a good accuracy when used to analyze sentences related to the sample location, diseases presence and status of the illnesses. CONCLUSIONS: The infrastructure allows the integration of multiple and heterogeneous sources of anonymized data from the different clinical platforms. Both unstructured and semi-structured data are processed to obtain a precise historical analysis of the clinical activities of one or more patients. Data aggregation allows to perform a series of statistical assessments required to answer complex questions that can be used in a variety of fields, such as predictive and precision medicine. In particular, studying the clinical history of patients that have developed similar pathologies can help to predict or individuate markers able to allow an early diagnosis of possible illnesses. BioMed Central 2018-10-15 /pmc/articles/PMC6191972/ /pubmed/30367571 http://dx.doi.org/10.1186/s12859-018-2300-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Moscatelli, Marco
Manconi, Andrea
Pessina, Mauro
Fellegara, Giovanni
Rampoldi, Stefano
Milanesi, Luciano
Casasco, Andrea
Gnocchi, Matteo
An infrastructure for precision medicine through analysis of big data
title An infrastructure for precision medicine through analysis of big data
title_full An infrastructure for precision medicine through analysis of big data
title_fullStr An infrastructure for precision medicine through analysis of big data
title_full_unstemmed An infrastructure for precision medicine through analysis of big data
title_short An infrastructure for precision medicine through analysis of big data
title_sort infrastructure for precision medicine through analysis of big data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191972/
https://www.ncbi.nlm.nih.gov/pubmed/30367571
http://dx.doi.org/10.1186/s12859-018-2300-5
work_keys_str_mv AT moscatellimarco aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT manconiandrea aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT pessinamauro aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT fellegaragiovanni aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT rampoldistefano aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT milanesiluciano aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT casascoandrea aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT gnocchimatteo aninfrastructureforprecisionmedicinethroughanalysisofbigdata
AT moscatellimarco infrastructureforprecisionmedicinethroughanalysisofbigdata
AT manconiandrea infrastructureforprecisionmedicinethroughanalysisofbigdata
AT pessinamauro infrastructureforprecisionmedicinethroughanalysisofbigdata
AT fellegaragiovanni infrastructureforprecisionmedicinethroughanalysisofbigdata
AT rampoldistefano infrastructureforprecisionmedicinethroughanalysisofbigdata
AT milanesiluciano infrastructureforprecisionmedicinethroughanalysisofbigdata
AT casascoandrea infrastructureforprecisionmedicinethroughanalysisofbigdata
AT gnocchimatteo infrastructureforprecisionmedicinethroughanalysisofbigdata