Cargando…

Damming the genomic data flood using a comprehensive analysis and storage data structure

Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnece...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bouffard, Marc, Phillips, Michael S., Brown, Andrew M.K., Marsh, Sharon, Tardif, Jean-Claude, van Rooij, Tibor
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004464/ https://www.ncbi.nlm.nih.gov/pubmed/21159730 http://dx.doi.org/10.1093/database/baq029

_version_	1782193981662167040
author	Bouffard, Marc Phillips, Michael S. Brown, Andrew M.K. Marsh, Sharon Tardif, Jean-Claude van Rooij, Tibor
author_facet	Bouffard, Marc Phillips, Michael S. Brown, Andrew M.K. Marsh, Sharon Tardif, Jean-Claude van Rooij, Tibor
author_sort	Bouffard, Marc
collection	PubMed
description	Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca
format	Text
id	pubmed-3004464
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-30044642010-12-20 Damming the genomic data flood using a comprehensive analysis and storage data structure Bouffard, Marc Phillips, Michael S. Brown, Andrew M.K. Marsh, Sharon Tardif, Jean-Claude van Rooij, Tibor Database (Oxford) Original Article Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca Oxford University Press 2010-12-15 /pmc/articles/PMC3004464/ /pubmed/21159730 http://dx.doi.org/10.1093/database/baq029 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Bouffard, Marc Phillips, Michael S. Brown, Andrew M.K. Marsh, Sharon Tardif, Jean-Claude van Rooij, Tibor Damming the genomic data flood using a comprehensive analysis and storage data structure
title	Damming the genomic data flood using a comprehensive analysis and storage data structure
title_full	Damming the genomic data flood using a comprehensive analysis and storage data structure
title_fullStr	Damming the genomic data flood using a comprehensive analysis and storage data structure
title_full_unstemmed	Damming the genomic data flood using a comprehensive analysis and storage data structure
title_short	Damming the genomic data flood using a comprehensive analysis and storage data structure
title_sort	damming the genomic data flood using a comprehensive analysis and storage data structure
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3004464/ https://www.ncbi.nlm.nih.gov/pubmed/21159730 http://dx.doi.org/10.1093/database/baq029
work_keys_str_mv	AT bouffardmarc dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure AT phillipsmichaels dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure AT brownandrewmk dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure AT marshsharon dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure AT tardifjeanclaude dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure AT vanrooijtibor dammingthegenomicdatafloodusingacomprehensiveanalysisandstoragedatastructure

Damming the genomic data flood using a comprehensive analysis and storage data structure

Ejemplares similares