Cargando…

Identification of pathogenic variant enriched regions across genes and gene families

Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9...

Descripción completa

Detalles Bibliográficos
Autores principales: Pérez-Palma, Eduardo, May, Patrick, Iqbal, Sumaiya, Niestroj, Lisa-Marie, Du, Juanjiangmeng, Heyne, Henrike O., Castrillon, Jessica A., O'Donnell-Luria, Anne, Nürnberg, Peter, Palotie, Aarno, Daly, Mark, Lal, Dennis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961572/
https://www.ncbi.nlm.nih.gov/pubmed/31871067
http://dx.doi.org/10.1101/gr.252601.119
_version_ 1783488020891041792
author Pérez-Palma, Eduardo
May, Patrick
Iqbal, Sumaiya
Niestroj, Lisa-Marie
Du, Juanjiangmeng
Heyne, Henrike O.
Castrillon, Jessica A.
O'Donnell-Luria, Anne
Nürnberg, Peter
Palotie, Aarno
Daly, Mark
Lal, Dennis
author_facet Pérez-Palma, Eduardo
May, Patrick
Iqbal, Sumaiya
Niestroj, Lisa-Marie
Du, Juanjiangmeng
Heyne, Henrike O.
Castrillon, Jessica A.
O'Donnell-Luria, Anne
Nürnberg, Peter
Palotie, Aarno
Daly, Mark
Lal, Dennis
author_sort Pérez-Palma, Eduardo
collection PubMed
description Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10(−11)). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10(−16)). All pathogenic variant enriched regions (PERs) identified are available online through “PER viewer,” a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.
format Online
Article
Text
id pubmed-6961572
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-69615722020-07-01 Identification of pathogenic variant enriched regions across genes and gene families Pérez-Palma, Eduardo May, Patrick Iqbal, Sumaiya Niestroj, Lisa-Marie Du, Juanjiangmeng Heyne, Henrike O. Castrillon, Jessica A. O'Donnell-Luria, Anne Nürnberg, Peter Palotie, Aarno Daly, Mark Lal, Dennis Genome Res Method Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10(−11)). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10(−16)). All pathogenic variant enriched regions (PERs) identified are available online through “PER viewer,” a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation. Cold Spring Harbor Laboratory Press 2020-01 /pmc/articles/PMC6961572/ /pubmed/31871067 http://dx.doi.org/10.1101/gr.252601.119 Text en © 2020 Pérez-Palma et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Pérez-Palma, Eduardo
May, Patrick
Iqbal, Sumaiya
Niestroj, Lisa-Marie
Du, Juanjiangmeng
Heyne, Henrike O.
Castrillon, Jessica A.
O'Donnell-Luria, Anne
Nürnberg, Peter
Palotie, Aarno
Daly, Mark
Lal, Dennis
Identification of pathogenic variant enriched regions across genes and gene families
title Identification of pathogenic variant enriched regions across genes and gene families
title_full Identification of pathogenic variant enriched regions across genes and gene families
title_fullStr Identification of pathogenic variant enriched regions across genes and gene families
title_full_unstemmed Identification of pathogenic variant enriched regions across genes and gene families
title_short Identification of pathogenic variant enriched regions across genes and gene families
title_sort identification of pathogenic variant enriched regions across genes and gene families
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961572/
https://www.ncbi.nlm.nih.gov/pubmed/31871067
http://dx.doi.org/10.1101/gr.252601.119
work_keys_str_mv AT perezpalmaeduardo identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT maypatrick identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT iqbalsumaiya identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT niestrojlisamarie identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT dujuanjiangmeng identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT heynehenrikeo identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT castrillonjessicaa identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT odonnellluriaanne identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT nurnbergpeter identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT palotieaarno identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT dalymark identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies
AT laldennis identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies