Cargando…
A deep catalog of protein-coding variation in 985,830 individuals
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197621/ https://www.ncbi.nlm.nih.gov/pubmed/37214792 http://dx.doi.org/10.1101/2023.05.09.539329 |
_version_ | 1785044583834451968 |
---|---|
author | Sun, Kathie Y. Bai, Xiaodong Chen, Siying Bao, Suying Kapoor, Manav Zhang, Chuanyi Backman, Joshua Joseph, Tyler Maxwell, Evan Mitra, George Gorovits, Alexander Mansfield, Adam Boutkov, Boris Gokhale, Sujit Habegger, Lukas Marcketta, Anthony Locke, Adam Kessler, Michael D. Sharma, Deepika Staples, Jeffrey Bovijn, Jonas Gelfman, Sahar Gioia, Alessandro Di Rajagopal, Veera Lopez, Alexander Varela, Jennifer Rico Alegre, Jesus Berumen, Jaime Tapia-Conyer, Roberto Kuri-Morales, Pablo Torres, Jason Emberson, Jonathan Collins, Rory Cantor, Michael Thornton, Timothy Kang, Hyun Min Overton, John Shuldiner, Alan R. Cremona, M. Laura Nafde, Mona Baras, Aris Abecasis, Goncalo Marchini, Jonathan Reid, Jeffrey G. Salerno, William Balasubramanian, Suganthi |
author_facet | Sun, Kathie Y. Bai, Xiaodong Chen, Siying Bao, Suying Kapoor, Manav Zhang, Chuanyi Backman, Joshua Joseph, Tyler Maxwell, Evan Mitra, George Gorovits, Alexander Mansfield, Adam Boutkov, Boris Gokhale, Sujit Habegger, Lukas Marcketta, Anthony Locke, Adam Kessler, Michael D. Sharma, Deepika Staples, Jeffrey Bovijn, Jonas Gelfman, Sahar Gioia, Alessandro Di Rajagopal, Veera Lopez, Alexander Varela, Jennifer Rico Alegre, Jesus Berumen, Jaime Tapia-Conyer, Roberto Kuri-Morales, Pablo Torres, Jason Emberson, Jonathan Collins, Rory Cantor, Michael Thornton, Timothy Kang, Hyun Min Overton, John Shuldiner, Alan R. Cremona, M. Laura Nafde, Mona Baras, Aris Abecasis, Goncalo Marchini, Jonathan Reid, Jeffrey G. Salerno, William Balasubramanian, Suganthi |
author_sort | Sun, Kathie Y. |
collection | PubMed |
description | Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts. |
format | Online Article Text |
id | pubmed-10197621 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101976212023-05-20 A deep catalog of protein-coding variation in 985,830 individuals Sun, Kathie Y. Bai, Xiaodong Chen, Siying Bao, Suying Kapoor, Manav Zhang, Chuanyi Backman, Joshua Joseph, Tyler Maxwell, Evan Mitra, George Gorovits, Alexander Mansfield, Adam Boutkov, Boris Gokhale, Sujit Habegger, Lukas Marcketta, Anthony Locke, Adam Kessler, Michael D. Sharma, Deepika Staples, Jeffrey Bovijn, Jonas Gelfman, Sahar Gioia, Alessandro Di Rajagopal, Veera Lopez, Alexander Varela, Jennifer Rico Alegre, Jesus Berumen, Jaime Tapia-Conyer, Roberto Kuri-Morales, Pablo Torres, Jason Emberson, Jonathan Collins, Rory Cantor, Michael Thornton, Timothy Kang, Hyun Min Overton, John Shuldiner, Alan R. Cremona, M. Laura Nafde, Mona Baras, Aris Abecasis, Goncalo Marchini, Jonathan Reid, Jeffrey G. Salerno, William Balasubramanian, Suganthi bioRxiv Article Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts. Cold Spring Harbor Laboratory 2023-11-02 /pmc/articles/PMC10197621/ /pubmed/37214792 http://dx.doi.org/10.1101/2023.05.09.539329 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Sun, Kathie Y. Bai, Xiaodong Chen, Siying Bao, Suying Kapoor, Manav Zhang, Chuanyi Backman, Joshua Joseph, Tyler Maxwell, Evan Mitra, George Gorovits, Alexander Mansfield, Adam Boutkov, Boris Gokhale, Sujit Habegger, Lukas Marcketta, Anthony Locke, Adam Kessler, Michael D. Sharma, Deepika Staples, Jeffrey Bovijn, Jonas Gelfman, Sahar Gioia, Alessandro Di Rajagopal, Veera Lopez, Alexander Varela, Jennifer Rico Alegre, Jesus Berumen, Jaime Tapia-Conyer, Roberto Kuri-Morales, Pablo Torres, Jason Emberson, Jonathan Collins, Rory Cantor, Michael Thornton, Timothy Kang, Hyun Min Overton, John Shuldiner, Alan R. Cremona, M. Laura Nafde, Mona Baras, Aris Abecasis, Goncalo Marchini, Jonathan Reid, Jeffrey G. Salerno, William Balasubramanian, Suganthi A deep catalog of protein-coding variation in 985,830 individuals |
title | A deep catalog of protein-coding variation in 985,830 individuals |
title_full | A deep catalog of protein-coding variation in 985,830 individuals |
title_fullStr | A deep catalog of protein-coding variation in 985,830 individuals |
title_full_unstemmed | A deep catalog of protein-coding variation in 985,830 individuals |
title_short | A deep catalog of protein-coding variation in 985,830 individuals |
title_sort | deep catalog of protein-coding variation in 985,830 individuals |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197621/ https://www.ncbi.nlm.nih.gov/pubmed/37214792 http://dx.doi.org/10.1101/2023.05.09.539329 |
work_keys_str_mv | AT sunkathiey adeepcatalogofproteincodingvariationin985830individuals AT baixiaodong adeepcatalogofproteincodingvariationin985830individuals AT chensiying adeepcatalogofproteincodingvariationin985830individuals AT baosuying adeepcatalogofproteincodingvariationin985830individuals AT kapoormanav adeepcatalogofproteincodingvariationin985830individuals AT zhangchuanyi adeepcatalogofproteincodingvariationin985830individuals AT backmanjoshua adeepcatalogofproteincodingvariationin985830individuals AT josephtyler adeepcatalogofproteincodingvariationin985830individuals AT maxwellevan adeepcatalogofproteincodingvariationin985830individuals AT mitrageorge adeepcatalogofproteincodingvariationin985830individuals AT gorovitsalexander adeepcatalogofproteincodingvariationin985830individuals AT mansfieldadam adeepcatalogofproteincodingvariationin985830individuals AT boutkovboris adeepcatalogofproteincodingvariationin985830individuals AT gokhalesujit adeepcatalogofproteincodingvariationin985830individuals AT habeggerlukas adeepcatalogofproteincodingvariationin985830individuals AT marckettaanthony adeepcatalogofproteincodingvariationin985830individuals AT lockeadam adeepcatalogofproteincodingvariationin985830individuals AT kesslermichaeld adeepcatalogofproteincodingvariationin985830individuals AT sharmadeepika adeepcatalogofproteincodingvariationin985830individuals AT staplesjeffrey adeepcatalogofproteincodingvariationin985830individuals AT bovijnjonas adeepcatalogofproteincodingvariationin985830individuals AT gelfmansahar adeepcatalogofproteincodingvariationin985830individuals AT gioiaalessandrodi adeepcatalogofproteincodingvariationin985830individuals AT rajagopalveera adeepcatalogofproteincodingvariationin985830individuals AT lopezalexander adeepcatalogofproteincodingvariationin985830individuals AT varelajenniferrico adeepcatalogofproteincodingvariationin985830individuals AT alegrejesus adeepcatalogofproteincodingvariationin985830individuals AT berumenjaime adeepcatalogofproteincodingvariationin985830individuals AT tapiaconyerroberto adeepcatalogofproteincodingvariationin985830individuals AT kurimoralespablo adeepcatalogofproteincodingvariationin985830individuals AT torresjason adeepcatalogofproteincodingvariationin985830individuals AT embersonjonathan adeepcatalogofproteincodingvariationin985830individuals AT collinsrory adeepcatalogofproteincodingvariationin985830individuals AT adeepcatalogofproteincodingvariationin985830individuals AT adeepcatalogofproteincodingvariationin985830individuals AT cantormichael adeepcatalogofproteincodingvariationin985830individuals AT thorntontimothy adeepcatalogofproteincodingvariationin985830individuals AT kanghyunmin adeepcatalogofproteincodingvariationin985830individuals AT overtonjohn adeepcatalogofproteincodingvariationin985830individuals AT shuldineralanr adeepcatalogofproteincodingvariationin985830individuals AT cremonamlaura adeepcatalogofproteincodingvariationin985830individuals AT nafdemona adeepcatalogofproteincodingvariationin985830individuals AT barasaris adeepcatalogofproteincodingvariationin985830individuals AT abecasisgoncalo adeepcatalogofproteincodingvariationin985830individuals AT marchinijonathan adeepcatalogofproteincodingvariationin985830individuals AT reidjeffreyg adeepcatalogofproteincodingvariationin985830individuals AT salernowilliam adeepcatalogofproteincodingvariationin985830individuals AT balasubramaniansuganthi adeepcatalogofproteincodingvariationin985830individuals AT sunkathiey deepcatalogofproteincodingvariationin985830individuals AT baixiaodong deepcatalogofproteincodingvariationin985830individuals AT chensiying deepcatalogofproteincodingvariationin985830individuals AT baosuying deepcatalogofproteincodingvariationin985830individuals AT kapoormanav deepcatalogofproteincodingvariationin985830individuals AT zhangchuanyi deepcatalogofproteincodingvariationin985830individuals AT backmanjoshua deepcatalogofproteincodingvariationin985830individuals AT josephtyler deepcatalogofproteincodingvariationin985830individuals AT maxwellevan deepcatalogofproteincodingvariationin985830individuals AT mitrageorge deepcatalogofproteincodingvariationin985830individuals AT gorovitsalexander deepcatalogofproteincodingvariationin985830individuals AT mansfieldadam deepcatalogofproteincodingvariationin985830individuals AT boutkovboris deepcatalogofproteincodingvariationin985830individuals AT gokhalesujit deepcatalogofproteincodingvariationin985830individuals AT habeggerlukas deepcatalogofproteincodingvariationin985830individuals AT marckettaanthony deepcatalogofproteincodingvariationin985830individuals AT lockeadam deepcatalogofproteincodingvariationin985830individuals AT kesslermichaeld deepcatalogofproteincodingvariationin985830individuals AT sharmadeepika deepcatalogofproteincodingvariationin985830individuals AT staplesjeffrey deepcatalogofproteincodingvariationin985830individuals AT bovijnjonas deepcatalogofproteincodingvariationin985830individuals AT gelfmansahar deepcatalogofproteincodingvariationin985830individuals AT gioiaalessandrodi deepcatalogofproteincodingvariationin985830individuals AT rajagopalveera deepcatalogofproteincodingvariationin985830individuals AT lopezalexander deepcatalogofproteincodingvariationin985830individuals AT varelajenniferrico deepcatalogofproteincodingvariationin985830individuals AT alegrejesus deepcatalogofproteincodingvariationin985830individuals AT berumenjaime deepcatalogofproteincodingvariationin985830individuals AT tapiaconyerroberto deepcatalogofproteincodingvariationin985830individuals AT kurimoralespablo deepcatalogofproteincodingvariationin985830individuals AT torresjason deepcatalogofproteincodingvariationin985830individuals AT embersonjonathan deepcatalogofproteincodingvariationin985830individuals AT collinsrory deepcatalogofproteincodingvariationin985830individuals AT deepcatalogofproteincodingvariationin985830individuals AT deepcatalogofproteincodingvariationin985830individuals AT cantormichael deepcatalogofproteincodingvariationin985830individuals AT thorntontimothy deepcatalogofproteincodingvariationin985830individuals AT kanghyunmin deepcatalogofproteincodingvariationin985830individuals AT overtonjohn deepcatalogofproteincodingvariationin985830individuals AT shuldineralanr deepcatalogofproteincodingvariationin985830individuals AT cremonamlaura deepcatalogofproteincodingvariationin985830individuals AT nafdemona deepcatalogofproteincodingvariationin985830individuals AT barasaris deepcatalogofproteincodingvariationin985830individuals AT abecasisgoncalo deepcatalogofproteincodingvariationin985830individuals AT marchinijonathan deepcatalogofproteincodingvariationin985830individuals AT reidjeffreyg deepcatalogofproteincodingvariationin985830individuals AT salernowilliam deepcatalogofproteincodingvariationin985830individuals AT balasubramaniansuganthi deepcatalogofproteincodingvariationin985830individuals |