Cargando…

Processing genome-wide association studies within a repository of heterogeneous genomic datasets

BACKGROUND: Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GW...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bernasconi, Anna, Canakoglu, Arif, Comolli, Federico
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985298/ https://www.ncbi.nlm.nih.gov/pubmed/36869294 http://dx.doi.org/10.1186/s12863-023-01111-y

_version_	1784900923705786368
author	Bernasconi, Anna Canakoglu, Arif Comolli, Federico
author_facet	Bernasconi, Anna Canakoglu, Arif Comolli, Federico
author_sort	Bernasconi, Anna
collection	PubMed
description	BACKGROUND: Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. RESULTS: To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. CONCLUSIONS: As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-023-01111-y.
format	Online Article Text
id	pubmed-9985298
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-99852982023-03-05 Processing genome-wide association studies within a repository of heterogeneous genomic datasets Bernasconi, Anna Canakoglu, Arif Comolli, Federico BMC Genom Data Research BACKGROUND: Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. RESULTS: To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multi-sample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. CONCLUSIONS: As a result of the our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-023-01111-y. BioMed Central 2023-03-03 /pmc/articles/PMC9985298/ /pubmed/36869294 http://dx.doi.org/10.1186/s12863-023-01111-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Bernasconi, Anna Canakoglu, Arif Comolli, Federico Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title	Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title_full	Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title_fullStr	Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title_full_unstemmed	Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title_short	Processing genome-wide association studies within a repository of heterogeneous genomic datasets
title_sort	processing genome-wide association studies within a repository of heterogeneous genomic datasets
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985298/ https://www.ncbi.nlm.nih.gov/pubmed/36869294 http://dx.doi.org/10.1186/s12863-023-01111-y
work_keys_str_mv	AT bernasconianna processinggenomewideassociationstudieswithinarepositoryofheterogeneousgenomicdatasets AT canakogluarif processinggenomewideassociationstudieswithinarepositoryofheterogeneousgenomicdatasets AT comollifederico processinggenomewideassociationstudieswithinarepositoryofheterogeneousgenomicdatasets

Processing genome-wide association studies within a repository of heterogeneous genomic datasets

Ejemplares similares