Cargando…
Identification of pathogenic variant enriched regions across genes and gene families
Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961572/ https://www.ncbi.nlm.nih.gov/pubmed/31871067 http://dx.doi.org/10.1101/gr.252601.119 |
_version_ | 1783488020891041792 |
---|---|
author | Pérez-Palma, Eduardo May, Patrick Iqbal, Sumaiya Niestroj, Lisa-Marie Du, Juanjiangmeng Heyne, Henrike O. Castrillon, Jessica A. O'Donnell-Luria, Anne Nürnberg, Peter Palotie, Aarno Daly, Mark Lal, Dennis |
author_facet | Pérez-Palma, Eduardo May, Patrick Iqbal, Sumaiya Niestroj, Lisa-Marie Du, Juanjiangmeng Heyne, Henrike O. Castrillon, Jessica A. O'Donnell-Luria, Anne Nürnberg, Peter Palotie, Aarno Daly, Mark Lal, Dennis |
author_sort | Pérez-Palma, Eduardo |
collection | PubMed |
description | Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10(−11)). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10(−16)). All pathogenic variant enriched regions (PERs) identified are available online through “PER viewer,” a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation. |
format | Online Article Text |
id | pubmed-6961572 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69615722020-07-01 Identification of pathogenic variant enriched regions across genes and gene families Pérez-Palma, Eduardo May, Patrick Iqbal, Sumaiya Niestroj, Lisa-Marie Du, Juanjiangmeng Heyne, Henrike O. Castrillon, Jessica A. O'Donnell-Luria, Anne Nürnberg, Peter Palotie, Aarno Daly, Mark Lal, Dennis Genome Res Method Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10(−11)). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10(−16)). All pathogenic variant enriched regions (PERs) identified are available online through “PER viewer,” a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation. Cold Spring Harbor Laboratory Press 2020-01 /pmc/articles/PMC6961572/ /pubmed/31871067 http://dx.doi.org/10.1101/gr.252601.119 Text en © 2020 Pérez-Palma et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Method Pérez-Palma, Eduardo May, Patrick Iqbal, Sumaiya Niestroj, Lisa-Marie Du, Juanjiangmeng Heyne, Henrike O. Castrillon, Jessica A. O'Donnell-Luria, Anne Nürnberg, Peter Palotie, Aarno Daly, Mark Lal, Dennis Identification of pathogenic variant enriched regions across genes and gene families |
title | Identification of pathogenic variant enriched regions across genes and gene families |
title_full | Identification of pathogenic variant enriched regions across genes and gene families |
title_fullStr | Identification of pathogenic variant enriched regions across genes and gene families |
title_full_unstemmed | Identification of pathogenic variant enriched regions across genes and gene families |
title_short | Identification of pathogenic variant enriched regions across genes and gene families |
title_sort | identification of pathogenic variant enriched regions across genes and gene families |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961572/ https://www.ncbi.nlm.nih.gov/pubmed/31871067 http://dx.doi.org/10.1101/gr.252601.119 |
work_keys_str_mv | AT perezpalmaeduardo identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT maypatrick identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT iqbalsumaiya identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT niestrojlisamarie identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT dujuanjiangmeng identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT heynehenrikeo identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT castrillonjessicaa identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT odonnellluriaanne identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT nurnbergpeter identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT palotieaarno identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT dalymark identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies AT laldennis identificationofpathogenicvariantenrichedregionsacrossgenesandgenefamilies |