Cargando…

A data model and database for high-resolution pathology analytical image informatics

BACKGROUND: The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data r...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Fusheng, Kong, Jun, Cooper, Lee, Pan, Tony, Kurc, Tahsin, Chen, Wenjin, Sharma, Ashish, Niedermayr, Cristobal, Oh, Tae W, Brat, Daniel, Farris, Alton B, Foran, David J, Saltz, Joel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications Pvt Ltd 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3153692/
https://www.ncbi.nlm.nih.gov/pubmed/21845230
http://dx.doi.org/10.4103/2153-3539.83192
_version_ 1782209922396585984
author Wang, Fusheng
Kong, Jun
Cooper, Lee
Pan, Tony
Kurc, Tahsin
Chen, Wenjin
Sharma, Ashish
Niedermayr, Cristobal
Oh, Tae W
Brat, Daniel
Farris, Alton B
Foran, David J
Saltz, Joel
author_facet Wang, Fusheng
Kong, Jun
Cooper, Lee
Pan, Tony
Kurc, Tahsin
Chen, Wenjin
Sharma, Ashish
Niedermayr, Cristobal
Oh, Tae W
Brat, Daniel
Farris, Alton B
Foran, David J
Saltz, Joel
author_sort Wang, Fusheng
collection PubMed
description BACKGROUND: The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. CONTEXT: This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). AIMS: (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. SETTINGS AND DESIGN: The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. MATERIALS AND METHODS: We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. RESULTS: We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. CONCLUSIONS: Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
format Online
Article
Text
id pubmed-3153692
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Medknow Publications Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-31536922011-08-15 A data model and database for high-resolution pathology analytical image informatics Wang, Fusheng Kong, Jun Cooper, Lee Pan, Tony Kurc, Tahsin Chen, Wenjin Sharma, Ashish Niedermayr, Cristobal Oh, Tae W Brat, Daniel Farris, Alton B Foran, David J Saltz, Joel J Pathol Inform Research Article BACKGROUND: The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. CONTEXT: This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). AIMS: (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. SETTINGS AND DESIGN: The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. MATERIALS AND METHODS: We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. RESULTS: We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. CONCLUSIONS: Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results. Medknow Publications Pvt Ltd 2011-07-26 /pmc/articles/PMC3153692/ /pubmed/21845230 http://dx.doi.org/10.4103/2153-3539.83192 Text en Copyright: © 2011 Wang F. http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wang, Fusheng
Kong, Jun
Cooper, Lee
Pan, Tony
Kurc, Tahsin
Chen, Wenjin
Sharma, Ashish
Niedermayr, Cristobal
Oh, Tae W
Brat, Daniel
Farris, Alton B
Foran, David J
Saltz, Joel
A data model and database for high-resolution pathology analytical image informatics
title A data model and database for high-resolution pathology analytical image informatics
title_full A data model and database for high-resolution pathology analytical image informatics
title_fullStr A data model and database for high-resolution pathology analytical image informatics
title_full_unstemmed A data model and database for high-resolution pathology analytical image informatics
title_short A data model and database for high-resolution pathology analytical image informatics
title_sort data model and database for high-resolution pathology analytical image informatics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3153692/
https://www.ncbi.nlm.nih.gov/pubmed/21845230
http://dx.doi.org/10.4103/2153-3539.83192
work_keys_str_mv AT wangfusheng adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT kongjun adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT cooperlee adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT pantony adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT kurctahsin adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT chenwenjin adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT sharmaashish adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT niedermayrcristobal adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT ohtaew adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT bratdaniel adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT farrisaltonb adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT forandavidj adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT saltzjoel adatamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT wangfusheng datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT kongjun datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT cooperlee datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT pantony datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT kurctahsin datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT chenwenjin datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT sharmaashish datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT niedermayrcristobal datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT ohtaew datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT bratdaniel datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT farrisaltonb datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT forandavidj datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics
AT saltzjoel datamodelanddatabaseforhighresolutionpathologyanalyticalimageinformatics