Cargando…

GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data

Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the av...

Descripción completa

Detalles Bibliográficos
Autores principales:	Giacopuzzi, Edoardo, Popitsch, Niko, Taylor, Jenny C
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Data Resources and Analyses
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8934622/ https://www.ncbi.nlm.nih.gov/pubmed/35234913 http://dx.doi.org/10.1093/nar/gkac130

_version_	1784671884804096000
author	Giacopuzzi, Edoardo Popitsch, Niko Taylor, Jenny C
author_facet	Giacopuzzi, Edoardo Popitsch, Niko Taylor, Jenny C
author_sort	Giacopuzzi, Edoardo
collection	PubMed
description	Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.
format	Online Article Text
id	pubmed-8934622
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-89346222022-03-21 GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data Giacopuzzi, Edoardo Popitsch, Niko Taylor, Jenny C Nucleic Acids Res Data Resources and Analyses Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses. Oxford University Press 2022-03-02 /pmc/articles/PMC8934622/ /pubmed/35234913 http://dx.doi.org/10.1093/nar/gkac130 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Data Resources and Analyses Giacopuzzi, Edoardo Popitsch, Niko Taylor, Jenny C GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title	GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title_full	GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title_fullStr	GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title_full_unstemmed	GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title_short	GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
title_sort	green-db: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data
topic	Data Resources and Analyses
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8934622/ https://www.ncbi.nlm.nih.gov/pubmed/35234913 http://dx.doi.org/10.1093/nar/gkac130
work_keys_str_mv	AT giacopuzziedoardo greendbaframeworkfortheannotationandprioritizationofnoncodingregulatoryvariantsfromwholegenomesequencingdata AT popitschniko greendbaframeworkfortheannotationandprioritizationofnoncodingregulatoryvariantsfromwholegenomesequencingdata AT taylorjennyc greendbaframeworkfortheannotationandprioritizationofnoncodingregulatoryvariantsfromwholegenomesequencingdata

GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data

Ejemplares similares