Cargando…

Scenario driven data modelling: a method for integrating diverse sources of data and data streams

BACKGROUND: Biology is rapidly becoming a data intensive, data-driven science. It is essential that data is represented and connected in ways that best represent its full conceptual content and allows both automated integration and data driven decision-making. Recent advancements in distributed mult...

Descripción completa

Detalles Bibliográficos
Autores principales: Griffith, Shelton D, Quest, Daniel J, Brettin, Thomas S, Cottingham, Robert W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236839/
https://www.ncbi.nlm.nih.gov/pubmed/22165854
http://dx.doi.org/10.1186/1471-2105-12-S10-S17
_version_ 1782218793601204224
author Griffith, Shelton D
Quest, Daniel J
Brettin, Thomas S
Cottingham, Robert W
author_facet Griffith, Shelton D
Quest, Daniel J
Brettin, Thomas S
Cottingham, Robert W
author_sort Griffith, Shelton D
collection PubMed
description BACKGROUND: Biology is rapidly becoming a data intensive, data-driven science. It is essential that data is represented and connected in ways that best represent its full conceptual content and allows both automated integration and data driven decision-making. Recent advancements in distributed multi-relational directed graphs, implemented in the form of the Semantic Web make it possible to deal with complicated heterogeneous data in new and interesting ways. RESULTS: This paper presents a new approach, scenario driven data modelling (SDDM), that integrates multi-relational directed graphs with data streams. SDDM can be applied to virtually any data integration challenge with widely divergent types of data and data streams. In this work, we explored integrating genetics data with reports from traditional media. SDDM was applied to the New Delhi metallo-beta-lactamase gene (NDM-1), an emerging global health threat. The SDDM process constructed a scenario, created a RDF multi-relational directed graph that linked diverse types of data to the Semantic Web, implemented RDF conversion tools (RDFizers) to bring content into the Sematic Web, identified data streams and analytical routines to analyse those streams, and identified user requirements and graph traversals to meet end-user requirements. CONCLUSIONS: We provided an example where SDDM was applied to a complex data integration challenge. The process created a model of the emerging NDM-1 health threat, identified and filled gaps in that model, and constructed reliable software that monitored data streams based on the scenario derived multi-relational directed graph. The SDDM process significantly reduced the software requirements phase by letting the scenario and resulting multi-relational directed graph define what is possible and then set the scope of the user requirements. Approaches like SDDM will be critical to the future of data intensive, data-driven science because they automate the process of converting massive data streams into usable knowledge.
format Online
Article
Text
id pubmed-3236839
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32368392011-12-14 Scenario driven data modelling: a method for integrating diverse sources of data and data streams Griffith, Shelton D Quest, Daniel J Brettin, Thomas S Cottingham, Robert W BMC Bioinformatics Proceedings BACKGROUND: Biology is rapidly becoming a data intensive, data-driven science. It is essential that data is represented and connected in ways that best represent its full conceptual content and allows both automated integration and data driven decision-making. Recent advancements in distributed multi-relational directed graphs, implemented in the form of the Semantic Web make it possible to deal with complicated heterogeneous data in new and interesting ways. RESULTS: This paper presents a new approach, scenario driven data modelling (SDDM), that integrates multi-relational directed graphs with data streams. SDDM can be applied to virtually any data integration challenge with widely divergent types of data and data streams. In this work, we explored integrating genetics data with reports from traditional media. SDDM was applied to the New Delhi metallo-beta-lactamase gene (NDM-1), an emerging global health threat. The SDDM process constructed a scenario, created a RDF multi-relational directed graph that linked diverse types of data to the Semantic Web, implemented RDF conversion tools (RDFizers) to bring content into the Sematic Web, identified data streams and analytical routines to analyse those streams, and identified user requirements and graph traversals to meet end-user requirements. CONCLUSIONS: We provided an example where SDDM was applied to a complex data integration challenge. The process created a model of the emerging NDM-1 health threat, identified and filled gaps in that model, and constructed reliable software that monitored data streams based on the scenario derived multi-relational directed graph. The SDDM process significantly reduced the software requirements phase by letting the scenario and resulting multi-relational directed graph define what is possible and then set the scope of the user requirements. Approaches like SDDM will be critical to the future of data intensive, data-driven science because they automate the process of converting massive data streams into usable knowledge. BioMed Central 2011-10-18 /pmc/articles/PMC3236839/ /pubmed/22165854 http://dx.doi.org/10.1186/1471-2105-12-S10-S17 Text en Copyright ©2011 Griffith et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Griffith, Shelton D
Quest, Daniel J
Brettin, Thomas S
Cottingham, Robert W
Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title_full Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title_fullStr Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title_full_unstemmed Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title_short Scenario driven data modelling: a method for integrating diverse sources of data and data streams
title_sort scenario driven data modelling: a method for integrating diverse sources of data and data streams
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236839/
https://www.ncbi.nlm.nih.gov/pubmed/22165854
http://dx.doi.org/10.1186/1471-2105-12-S10-S17
work_keys_str_mv AT griffithsheltond scenariodrivendatamodellingamethodforintegratingdiversesourcesofdataanddatastreams
AT questdanielj scenariodrivendatamodellingamethodforintegratingdiversesourcesofdataanddatastreams
AT brettinthomass scenariodrivendatamodellingamethodforintegratingdiversesourcesofdataanddatastreams
AT cottinghamrobertw scenariodrivendatamodellingamethodforintegratingdiversesourcesofdataanddatastreams