Cargando…

A deep catalog of protein-coding variation in 985,830 individuals

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Kathie Y., Bai, Xiaodong, Chen, Siying, Bao, Suying, Kapoor, Manav, Zhang, Chuanyi, Backman, Joshua, Joseph, Tyler, Maxwell, Evan, Mitra, George, Gorovits, Alexander, Mansfield, Adam, Boutkov, Boris, Gokhale, Sujit, Habegger, Lukas, Marcketta, Anthony, Locke, Adam, Kessler, Michael D., Sharma, Deepika, Staples, Jeffrey, Bovijn, Jonas, Gelfman, Sahar, Gioia, Alessandro Di, Rajagopal, Veera, Lopez, Alexander, Varela, Jennifer Rico, Alegre, Jesus, Berumen, Jaime, Tapia-Conyer, Roberto, Kuri-Morales, Pablo, Torres, Jason, Emberson, Jonathan, Collins, Rory, Cantor, Michael, Thornton, Timothy, Kang, Hyun Min, Overton, John, Shuldiner, Alan R., Cremona, M. Laura, Nafde, Mona, Baras, Aris, Abecasis, Goncalo, Marchini, Jonathan, Reid, Jeffrey G., Salerno, William, Balasubramanian, Suganthi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197621/
https://www.ncbi.nlm.nih.gov/pubmed/37214792
http://dx.doi.org/10.1101/2023.05.09.539329
_version_ 1785044583834451968
author Sun, Kathie Y.
Bai, Xiaodong
Chen, Siying
Bao, Suying
Kapoor, Manav
Zhang, Chuanyi
Backman, Joshua
Joseph, Tyler
Maxwell, Evan
Mitra, George
Gorovits, Alexander
Mansfield, Adam
Boutkov, Boris
Gokhale, Sujit
Habegger, Lukas
Marcketta, Anthony
Locke, Adam
Kessler, Michael D.
Sharma, Deepika
Staples, Jeffrey
Bovijn, Jonas
Gelfman, Sahar
Gioia, Alessandro Di
Rajagopal, Veera
Lopez, Alexander
Varela, Jennifer Rico
Alegre, Jesus
Berumen, Jaime
Tapia-Conyer, Roberto
Kuri-Morales, Pablo
Torres, Jason
Emberson, Jonathan
Collins, Rory
Cantor, Michael
Thornton, Timothy
Kang, Hyun Min
Overton, John
Shuldiner, Alan R.
Cremona, M. Laura
Nafde, Mona
Baras, Aris
Abecasis, Goncalo
Marchini, Jonathan
Reid, Jeffrey G.
Salerno, William
Balasubramanian, Suganthi
author_facet Sun, Kathie Y.
Bai, Xiaodong
Chen, Siying
Bao, Suying
Kapoor, Manav
Zhang, Chuanyi
Backman, Joshua
Joseph, Tyler
Maxwell, Evan
Mitra, George
Gorovits, Alexander
Mansfield, Adam
Boutkov, Boris
Gokhale, Sujit
Habegger, Lukas
Marcketta, Anthony
Locke, Adam
Kessler, Michael D.
Sharma, Deepika
Staples, Jeffrey
Bovijn, Jonas
Gelfman, Sahar
Gioia, Alessandro Di
Rajagopal, Veera
Lopez, Alexander
Varela, Jennifer Rico
Alegre, Jesus
Berumen, Jaime
Tapia-Conyer, Roberto
Kuri-Morales, Pablo
Torres, Jason
Emberson, Jonathan
Collins, Rory
Cantor, Michael
Thornton, Timothy
Kang, Hyun Min
Overton, John
Shuldiner, Alan R.
Cremona, M. Laura
Nafde, Mona
Baras, Aris
Abecasis, Goncalo
Marchini, Jonathan
Reid, Jeffrey G.
Salerno, William
Balasubramanian, Suganthi
author_sort Sun, Kathie Y.
collection PubMed
description Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
format Online
Article
Text
id pubmed-10197621
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101976212023-05-20 A deep catalog of protein-coding variation in 985,830 individuals Sun, Kathie Y. Bai, Xiaodong Chen, Siying Bao, Suying Kapoor, Manav Zhang, Chuanyi Backman, Joshua Joseph, Tyler Maxwell, Evan Mitra, George Gorovits, Alexander Mansfield, Adam Boutkov, Boris Gokhale, Sujit Habegger, Lukas Marcketta, Anthony Locke, Adam Kessler, Michael D. Sharma, Deepika Staples, Jeffrey Bovijn, Jonas Gelfman, Sahar Gioia, Alessandro Di Rajagopal, Veera Lopez, Alexander Varela, Jennifer Rico Alegre, Jesus Berumen, Jaime Tapia-Conyer, Roberto Kuri-Morales, Pablo Torres, Jason Emberson, Jonathan Collins, Rory Cantor, Michael Thornton, Timothy Kang, Hyun Min Overton, John Shuldiner, Alan R. Cremona, M. Laura Nafde, Mona Baras, Aris Abecasis, Goncalo Marchini, Jonathan Reid, Jeffrey G. Salerno, William Balasubramanian, Suganthi bioRxiv Article Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts. Cold Spring Harbor Laboratory 2023-11-02 /pmc/articles/PMC10197621/ /pubmed/37214792 http://dx.doi.org/10.1101/2023.05.09.539329 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Sun, Kathie Y.
Bai, Xiaodong
Chen, Siying
Bao, Suying
Kapoor, Manav
Zhang, Chuanyi
Backman, Joshua
Joseph, Tyler
Maxwell, Evan
Mitra, George
Gorovits, Alexander
Mansfield, Adam
Boutkov, Boris
Gokhale, Sujit
Habegger, Lukas
Marcketta, Anthony
Locke, Adam
Kessler, Michael D.
Sharma, Deepika
Staples, Jeffrey
Bovijn, Jonas
Gelfman, Sahar
Gioia, Alessandro Di
Rajagopal, Veera
Lopez, Alexander
Varela, Jennifer Rico
Alegre, Jesus
Berumen, Jaime
Tapia-Conyer, Roberto
Kuri-Morales, Pablo
Torres, Jason
Emberson, Jonathan
Collins, Rory
Cantor, Michael
Thornton, Timothy
Kang, Hyun Min
Overton, John
Shuldiner, Alan R.
Cremona, M. Laura
Nafde, Mona
Baras, Aris
Abecasis, Goncalo
Marchini, Jonathan
Reid, Jeffrey G.
Salerno, William
Balasubramanian, Suganthi
A deep catalog of protein-coding variation in 985,830 individuals
title A deep catalog of protein-coding variation in 985,830 individuals
title_full A deep catalog of protein-coding variation in 985,830 individuals
title_fullStr A deep catalog of protein-coding variation in 985,830 individuals
title_full_unstemmed A deep catalog of protein-coding variation in 985,830 individuals
title_short A deep catalog of protein-coding variation in 985,830 individuals
title_sort deep catalog of protein-coding variation in 985,830 individuals
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197621/
https://www.ncbi.nlm.nih.gov/pubmed/37214792
http://dx.doi.org/10.1101/2023.05.09.539329
work_keys_str_mv AT sunkathiey adeepcatalogofproteincodingvariationin985830individuals
AT baixiaodong adeepcatalogofproteincodingvariationin985830individuals
AT chensiying adeepcatalogofproteincodingvariationin985830individuals
AT baosuying adeepcatalogofproteincodingvariationin985830individuals
AT kapoormanav adeepcatalogofproteincodingvariationin985830individuals
AT zhangchuanyi adeepcatalogofproteincodingvariationin985830individuals
AT backmanjoshua adeepcatalogofproteincodingvariationin985830individuals
AT josephtyler adeepcatalogofproteincodingvariationin985830individuals
AT maxwellevan adeepcatalogofproteincodingvariationin985830individuals
AT mitrageorge adeepcatalogofproteincodingvariationin985830individuals
AT gorovitsalexander adeepcatalogofproteincodingvariationin985830individuals
AT mansfieldadam adeepcatalogofproteincodingvariationin985830individuals
AT boutkovboris adeepcatalogofproteincodingvariationin985830individuals
AT gokhalesujit adeepcatalogofproteincodingvariationin985830individuals
AT habeggerlukas adeepcatalogofproteincodingvariationin985830individuals
AT marckettaanthony adeepcatalogofproteincodingvariationin985830individuals
AT lockeadam adeepcatalogofproteincodingvariationin985830individuals
AT kesslermichaeld adeepcatalogofproteincodingvariationin985830individuals
AT sharmadeepika adeepcatalogofproteincodingvariationin985830individuals
AT staplesjeffrey adeepcatalogofproteincodingvariationin985830individuals
AT bovijnjonas adeepcatalogofproteincodingvariationin985830individuals
AT gelfmansahar adeepcatalogofproteincodingvariationin985830individuals
AT gioiaalessandrodi adeepcatalogofproteincodingvariationin985830individuals
AT rajagopalveera adeepcatalogofproteincodingvariationin985830individuals
AT lopezalexander adeepcatalogofproteincodingvariationin985830individuals
AT varelajenniferrico adeepcatalogofproteincodingvariationin985830individuals
AT alegrejesus adeepcatalogofproteincodingvariationin985830individuals
AT berumenjaime adeepcatalogofproteincodingvariationin985830individuals
AT tapiaconyerroberto adeepcatalogofproteincodingvariationin985830individuals
AT kurimoralespablo adeepcatalogofproteincodingvariationin985830individuals
AT torresjason adeepcatalogofproteincodingvariationin985830individuals
AT embersonjonathan adeepcatalogofproteincodingvariationin985830individuals
AT collinsrory adeepcatalogofproteincodingvariationin985830individuals
AT adeepcatalogofproteincodingvariationin985830individuals
AT adeepcatalogofproteincodingvariationin985830individuals
AT cantormichael adeepcatalogofproteincodingvariationin985830individuals
AT thorntontimothy adeepcatalogofproteincodingvariationin985830individuals
AT kanghyunmin adeepcatalogofproteincodingvariationin985830individuals
AT overtonjohn adeepcatalogofproteincodingvariationin985830individuals
AT shuldineralanr adeepcatalogofproteincodingvariationin985830individuals
AT cremonamlaura adeepcatalogofproteincodingvariationin985830individuals
AT nafdemona adeepcatalogofproteincodingvariationin985830individuals
AT barasaris adeepcatalogofproteincodingvariationin985830individuals
AT abecasisgoncalo adeepcatalogofproteincodingvariationin985830individuals
AT marchinijonathan adeepcatalogofproteincodingvariationin985830individuals
AT reidjeffreyg adeepcatalogofproteincodingvariationin985830individuals
AT salernowilliam adeepcatalogofproteincodingvariationin985830individuals
AT balasubramaniansuganthi adeepcatalogofproteincodingvariationin985830individuals
AT sunkathiey deepcatalogofproteincodingvariationin985830individuals
AT baixiaodong deepcatalogofproteincodingvariationin985830individuals
AT chensiying deepcatalogofproteincodingvariationin985830individuals
AT baosuying deepcatalogofproteincodingvariationin985830individuals
AT kapoormanav deepcatalogofproteincodingvariationin985830individuals
AT zhangchuanyi deepcatalogofproteincodingvariationin985830individuals
AT backmanjoshua deepcatalogofproteincodingvariationin985830individuals
AT josephtyler deepcatalogofproteincodingvariationin985830individuals
AT maxwellevan deepcatalogofproteincodingvariationin985830individuals
AT mitrageorge deepcatalogofproteincodingvariationin985830individuals
AT gorovitsalexander deepcatalogofproteincodingvariationin985830individuals
AT mansfieldadam deepcatalogofproteincodingvariationin985830individuals
AT boutkovboris deepcatalogofproteincodingvariationin985830individuals
AT gokhalesujit deepcatalogofproteincodingvariationin985830individuals
AT habeggerlukas deepcatalogofproteincodingvariationin985830individuals
AT marckettaanthony deepcatalogofproteincodingvariationin985830individuals
AT lockeadam deepcatalogofproteincodingvariationin985830individuals
AT kesslermichaeld deepcatalogofproteincodingvariationin985830individuals
AT sharmadeepika deepcatalogofproteincodingvariationin985830individuals
AT staplesjeffrey deepcatalogofproteincodingvariationin985830individuals
AT bovijnjonas deepcatalogofproteincodingvariationin985830individuals
AT gelfmansahar deepcatalogofproteincodingvariationin985830individuals
AT gioiaalessandrodi deepcatalogofproteincodingvariationin985830individuals
AT rajagopalveera deepcatalogofproteincodingvariationin985830individuals
AT lopezalexander deepcatalogofproteincodingvariationin985830individuals
AT varelajenniferrico deepcatalogofproteincodingvariationin985830individuals
AT alegrejesus deepcatalogofproteincodingvariationin985830individuals
AT berumenjaime deepcatalogofproteincodingvariationin985830individuals
AT tapiaconyerroberto deepcatalogofproteincodingvariationin985830individuals
AT kurimoralespablo deepcatalogofproteincodingvariationin985830individuals
AT torresjason deepcatalogofproteincodingvariationin985830individuals
AT embersonjonathan deepcatalogofproteincodingvariationin985830individuals
AT collinsrory deepcatalogofproteincodingvariationin985830individuals
AT deepcatalogofproteincodingvariationin985830individuals
AT deepcatalogofproteincodingvariationin985830individuals
AT cantormichael deepcatalogofproteincodingvariationin985830individuals
AT thorntontimothy deepcatalogofproteincodingvariationin985830individuals
AT kanghyunmin deepcatalogofproteincodingvariationin985830individuals
AT overtonjohn deepcatalogofproteincodingvariationin985830individuals
AT shuldineralanr deepcatalogofproteincodingvariationin985830individuals
AT cremonamlaura deepcatalogofproteincodingvariationin985830individuals
AT nafdemona deepcatalogofproteincodingvariationin985830individuals
AT barasaris deepcatalogofproteincodingvariationin985830individuals
AT abecasisgoncalo deepcatalogofproteincodingvariationin985830individuals
AT marchinijonathan deepcatalogofproteincodingvariationin985830individuals
AT reidjeffreyg deepcatalogofproteincodingvariationin985830individuals
AT salernowilliam deepcatalogofproteincodingvariationin985830individuals
AT balasubramaniansuganthi deepcatalogofproteincodingvariationin985830individuals