Cargando…
Semi-automated assembly of high-quality diploid human reference genomes
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a hig...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9668749/ https://www.ncbi.nlm.nih.gov/pubmed/36261518 http://dx.doi.org/10.1038/s41586-022-05325-5 |
_version_ | 1784831981451739136 |
---|---|
author | Jarvis, Erich D. Formenti, Giulio Rhie, Arang Guarracino, Andrea Yang, Chentao Wood, Jonathan Tracey, Alan Thibaud-Nissen, Francoise Vollger, Mitchell R. Porubsky, David Cheng, Haoyu Asri, Mobin Logsdon, Glennis A. Carnevali, Paolo Chaisson, Mark J. P. Chin, Chen-Shan Cody, Sarah Collins, Joanna Ebert, Peter Escalona, Merly Fedrigo, Olivier Fulton, Robert S. Fulton, Lucinda L. Garg, Shilpa Gerton, Jennifer L. Ghurye, Jay Granat, Anastasiya Green, Richard E. Harvey, William Hasenfeld, Patrick Hastie, Alex Haukness, Marina Jaeger, Erich B. Jain, Miten Kirsche, Melanie Kolmogorov, Mikhail Korbel, Jan O. Koren, Sergey Korlach, Jonas Lee, Joyce Li, Daofeng Lindsay, Tina Lucas, Julian Luo, Feng Marschall, Tobias Mitchell, Matthew W. McDaniel, Jennifer Nie, Fan Olsen, Hugh E. Olson, Nathan D. Pesout, Trevor Potapova, Tamara Puiu, Daniela Regier, Allison Ruan, Jue Salzberg, Steven L. Sanders, Ashley D. Schatz, Michael C. Schmitt, Anthony Schneider, Valerie A. Selvaraj, Siddarth Shafin, Kishwar Shumate, Alaina Stitziel, Nathan O. Stober, Catherine Torrance, James Wagner, Justin Wang, Jianxin Wenger, Aaron Xiao, Chuanle Zimin, Aleksey V. Zhang, Guojie Wang, Ting Li, Heng Garrison, Erik Haussler, David Hall, Ira Zook, Justin M. Eichler, Evan E. Phillippy, Adam M. Paten, Benedict Howe, Kerstin Miga, Karen H. |
author_facet | Jarvis, Erich D. Formenti, Giulio Rhie, Arang Guarracino, Andrea Yang, Chentao Wood, Jonathan Tracey, Alan Thibaud-Nissen, Francoise Vollger, Mitchell R. Porubsky, David Cheng, Haoyu Asri, Mobin Logsdon, Glennis A. Carnevali, Paolo Chaisson, Mark J. P. Chin, Chen-Shan Cody, Sarah Collins, Joanna Ebert, Peter Escalona, Merly Fedrigo, Olivier Fulton, Robert S. Fulton, Lucinda L. Garg, Shilpa Gerton, Jennifer L. Ghurye, Jay Granat, Anastasiya Green, Richard E. Harvey, William Hasenfeld, Patrick Hastie, Alex Haukness, Marina Jaeger, Erich B. Jain, Miten Kirsche, Melanie Kolmogorov, Mikhail Korbel, Jan O. Koren, Sergey Korlach, Jonas Lee, Joyce Li, Daofeng Lindsay, Tina Lucas, Julian Luo, Feng Marschall, Tobias Mitchell, Matthew W. McDaniel, Jennifer Nie, Fan Olsen, Hugh E. Olson, Nathan D. Pesout, Trevor Potapova, Tamara Puiu, Daniela Regier, Allison Ruan, Jue Salzberg, Steven L. Sanders, Ashley D. Schatz, Michael C. Schmitt, Anthony Schneider, Valerie A. Selvaraj, Siddarth Shafin, Kishwar Shumate, Alaina Stitziel, Nathan O. Stober, Catherine Torrance, James Wagner, Justin Wang, Jianxin Wenger, Aaron Xiao, Chuanle Zimin, Aleksey V. Zhang, Guojie Wang, Ting Li, Heng Garrison, Erik Haussler, David Hall, Ira Zook, Justin M. Eichler, Evan E. Phillippy, Adam M. Paten, Benedict Howe, Kerstin Miga, Karen H. |
author_sort | Jarvis, Erich D. |
collection | PubMed |
description | The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome(5). To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity(6). Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements. |
format | Online Article Text |
id | pubmed-9668749 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96687492022-11-18 Semi-automated assembly of high-quality diploid human reference genomes Jarvis, Erich D. Formenti, Giulio Rhie, Arang Guarracino, Andrea Yang, Chentao Wood, Jonathan Tracey, Alan Thibaud-Nissen, Francoise Vollger, Mitchell R. Porubsky, David Cheng, Haoyu Asri, Mobin Logsdon, Glennis A. Carnevali, Paolo Chaisson, Mark J. P. Chin, Chen-Shan Cody, Sarah Collins, Joanna Ebert, Peter Escalona, Merly Fedrigo, Olivier Fulton, Robert S. Fulton, Lucinda L. Garg, Shilpa Gerton, Jennifer L. Ghurye, Jay Granat, Anastasiya Green, Richard E. Harvey, William Hasenfeld, Patrick Hastie, Alex Haukness, Marina Jaeger, Erich B. Jain, Miten Kirsche, Melanie Kolmogorov, Mikhail Korbel, Jan O. Koren, Sergey Korlach, Jonas Lee, Joyce Li, Daofeng Lindsay, Tina Lucas, Julian Luo, Feng Marschall, Tobias Mitchell, Matthew W. McDaniel, Jennifer Nie, Fan Olsen, Hugh E. Olson, Nathan D. Pesout, Trevor Potapova, Tamara Puiu, Daniela Regier, Allison Ruan, Jue Salzberg, Steven L. Sanders, Ashley D. Schatz, Michael C. Schmitt, Anthony Schneider, Valerie A. Selvaraj, Siddarth Shafin, Kishwar Shumate, Alaina Stitziel, Nathan O. Stober, Catherine Torrance, James Wagner, Justin Wang, Jianxin Wenger, Aaron Xiao, Chuanle Zimin, Aleksey V. Zhang, Guojie Wang, Ting Li, Heng Garrison, Erik Haussler, David Hall, Ira Zook, Justin M. Eichler, Evan E. Phillippy, Adam M. Paten, Benedict Howe, Kerstin Miga, Karen H. Nature Article The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome(5). To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity(6). Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements. Nature Publishing Group UK 2022-10-19 2022 /pmc/articles/PMC9668749/ /pubmed/36261518 http://dx.doi.org/10.1038/s41586-022-05325-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Jarvis, Erich D. Formenti, Giulio Rhie, Arang Guarracino, Andrea Yang, Chentao Wood, Jonathan Tracey, Alan Thibaud-Nissen, Francoise Vollger, Mitchell R. Porubsky, David Cheng, Haoyu Asri, Mobin Logsdon, Glennis A. Carnevali, Paolo Chaisson, Mark J. P. Chin, Chen-Shan Cody, Sarah Collins, Joanna Ebert, Peter Escalona, Merly Fedrigo, Olivier Fulton, Robert S. Fulton, Lucinda L. Garg, Shilpa Gerton, Jennifer L. Ghurye, Jay Granat, Anastasiya Green, Richard E. Harvey, William Hasenfeld, Patrick Hastie, Alex Haukness, Marina Jaeger, Erich B. Jain, Miten Kirsche, Melanie Kolmogorov, Mikhail Korbel, Jan O. Koren, Sergey Korlach, Jonas Lee, Joyce Li, Daofeng Lindsay, Tina Lucas, Julian Luo, Feng Marschall, Tobias Mitchell, Matthew W. McDaniel, Jennifer Nie, Fan Olsen, Hugh E. Olson, Nathan D. Pesout, Trevor Potapova, Tamara Puiu, Daniela Regier, Allison Ruan, Jue Salzberg, Steven L. Sanders, Ashley D. Schatz, Michael C. Schmitt, Anthony Schneider, Valerie A. Selvaraj, Siddarth Shafin, Kishwar Shumate, Alaina Stitziel, Nathan O. Stober, Catherine Torrance, James Wagner, Justin Wang, Jianxin Wenger, Aaron Xiao, Chuanle Zimin, Aleksey V. Zhang, Guojie Wang, Ting Li, Heng Garrison, Erik Haussler, David Hall, Ira Zook, Justin M. Eichler, Evan E. Phillippy, Adam M. Paten, Benedict Howe, Kerstin Miga, Karen H. Semi-automated assembly of high-quality diploid human reference genomes |
title | Semi-automated assembly of high-quality diploid human reference genomes |
title_full | Semi-automated assembly of high-quality diploid human reference genomes |
title_fullStr | Semi-automated assembly of high-quality diploid human reference genomes |
title_full_unstemmed | Semi-automated assembly of high-quality diploid human reference genomes |
title_short | Semi-automated assembly of high-quality diploid human reference genomes |
title_sort | semi-automated assembly of high-quality diploid human reference genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9668749/ https://www.ncbi.nlm.nih.gov/pubmed/36261518 http://dx.doi.org/10.1038/s41586-022-05325-5 |
work_keys_str_mv | AT jarviserichd semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT formentigiulio semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT rhiearang semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT guarracinoandrea semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT yangchentao semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT woodjonathan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT traceyalan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT thibaudnissenfrancoise semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT vollgermitchellr semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT porubskydavid semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT chenghaoyu semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT asrimobin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT logsdonglennisa semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT carnevalipaolo semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT chaissonmarkjp semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT chinchenshan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT codysarah semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT collinsjoanna semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT ebertpeter semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT escalonamerly semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT fedrigoolivier semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT fultonroberts semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT fultonlucindal semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT gargshilpa semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT gertonjenniferl semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT ghuryejay semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT granatanastasiya semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT greenricharde semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT harveywilliam semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT hasenfeldpatrick semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT hastiealex semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT hauknessmarina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT jaegererichb semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT jainmiten semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT kirschemelanie semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT kolmogorovmikhail semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT korbeljano semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT korensergey semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT korlachjonas semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT leejoyce semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT lidaofeng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT lindsaytina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT lucasjulian semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT luofeng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT marschalltobias semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT mitchellmattheww semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT mcdanieljennifer semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT niefan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT olsenhughe semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT olsonnathand semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT pesouttrevor semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT potapovatamara semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT puiudaniela semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT regierallison semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT ruanjue semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT salzbergstevenl semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT sandersashleyd semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT schatzmichaelc semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT schmittanthony semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT schneidervaleriea semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT selvarajsiddarth semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT shafinkishwar semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT shumatealaina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT stitzielnathano semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT stobercatherine semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT torrancejames semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT wagnerjustin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT wangjianxin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT wengeraaron semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT xiaochuanle semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT ziminalekseyv semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT zhangguojie semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT wangting semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT liheng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT garrisonerik semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT hausslerdavid semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT hallira semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT zookjustinm semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT eichlerevane semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT phillippyadamm semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT patenbenedict semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT howekerstin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT migakarenh semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes AT semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes |