Cargando…

A framework to score the effects of structural variants in health and disease

Although technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a com...

Descripción completa

Detalles Bibliográficos
Autores principales: Kleinert, Philip, Kircher, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8997355/
https://www.ncbi.nlm.nih.gov/pubmed/35197310
http://dx.doi.org/10.1101/gr.275995.121
_version_ 1784684684121210880
author Kleinert, Philip
Kircher, Martin
author_facet Kleinert, Philip
Kircher, Martin
author_sort Kleinert, Philip
collection PubMed
description Although technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human- and chimpanzee-derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV.
format Online
Article
Text
id pubmed-8997355
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-89973552022-04-22 A framework to score the effects of structural variants in health and disease Kleinert, Philip Kircher, Martin Genome Res Method Although technological advances improved the identification of structural variants (SVs) in the human genome, their interpretation remains challenging. Several methods utilize individual mechanistic principles like the deletion of coding sequence or 3D genome architecture disruptions. However, a comprehensive tool using the broad spectrum of available annotations is missing. Here, we describe CADD-SV, a method to retrieve and integrate a wide set of annotations to predict the effects of SVs. Previously, supervised learning approaches were limited due to a small number and biased set of annotated pathogenic or benign SVs. We overcome this problem by using a surrogate training objective, the Combined Annotation Dependent Depletion (CADD) of functional variants. We use human- and chimpanzee-derived SVs as proxy-neutral and contrast them with matched simulated variants as proxy-deleterious, an approach that has proven powerful for short sequence variants. Our tool computes summary statistics over diverse variant annotations and uses random forest models to prioritize deleterious structural variants. The resulting CADD-SV scores correlate with known pathogenic and rare population variants. We further show that we can prioritize somatic cancer variants as well as noncoding variants known to affect gene expression. We provide a website and offline-scoring tool for easy application of CADD-SV. Cold Spring Harbor Laboratory Press 2022-04 /pmc/articles/PMC8997355/ /pubmed/35197310 http://dx.doi.org/10.1101/gr.275995.121 Text en © 2022 Kleinert and Kircher; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Method
Kleinert, Philip
Kircher, Martin
A framework to score the effects of structural variants in health and disease
title A framework to score the effects of structural variants in health and disease
title_full A framework to score the effects of structural variants in health and disease
title_fullStr A framework to score the effects of structural variants in health and disease
title_full_unstemmed A framework to score the effects of structural variants in health and disease
title_short A framework to score the effects of structural variants in health and disease
title_sort framework to score the effects of structural variants in health and disease
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8997355/
https://www.ncbi.nlm.nih.gov/pubmed/35197310
http://dx.doi.org/10.1101/gr.275995.121
work_keys_str_mv AT kleinertphilip aframeworktoscoretheeffectsofstructuralvariantsinhealthanddisease
AT kirchermartin aframeworktoscoretheeffectsofstructuralvariantsinhealthanddisease
AT kleinertphilip frameworktoscoretheeffectsofstructuralvariantsinhealthanddisease
AT kirchermartin frameworktoscoretheeffectsofstructuralvariantsinhealthanddisease