Cargando…

Semi-automated assembly of high-quality diploid human reference genomes

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a hig...

Descripción completa

Detalles Bibliográficos
Autores principales: Jarvis, Erich D., Formenti, Giulio, Rhie, Arang, Guarracino, Andrea, Yang, Chentao, Wood, Jonathan, Tracey, Alan, Thibaud-Nissen, Francoise, Vollger, Mitchell R., Porubsky, David, Cheng, Haoyu, Asri, Mobin, Logsdon, Glennis A., Carnevali, Paolo, Chaisson, Mark J. P., Chin, Chen-Shan, Cody, Sarah, Collins, Joanna, Ebert, Peter, Escalona, Merly, Fedrigo, Olivier, Fulton, Robert S., Fulton, Lucinda L., Garg, Shilpa, Gerton, Jennifer L., Ghurye, Jay, Granat, Anastasiya, Green, Richard E., Harvey, William, Hasenfeld, Patrick, Hastie, Alex, Haukness, Marina, Jaeger, Erich B., Jain, Miten, Kirsche, Melanie, Kolmogorov, Mikhail, Korbel, Jan O., Koren, Sergey, Korlach, Jonas, Lee, Joyce, Li, Daofeng, Lindsay, Tina, Lucas, Julian, Luo, Feng, Marschall, Tobias, Mitchell, Matthew W., McDaniel, Jennifer, Nie, Fan, Olsen, Hugh E., Olson, Nathan D., Pesout, Trevor, Potapova, Tamara, Puiu, Daniela, Regier, Allison, Ruan, Jue, Salzberg, Steven L., Sanders, Ashley D., Schatz, Michael C., Schmitt, Anthony, Schneider, Valerie A., Selvaraj, Siddarth, Shafin, Kishwar, Shumate, Alaina, Stitziel, Nathan O., Stober, Catherine, Torrance, James, Wagner, Justin, Wang, Jianxin, Wenger, Aaron, Xiao, Chuanle, Zimin, Aleksey V., Zhang, Guojie, Wang, Ting, Li, Heng, Garrison, Erik, Haussler, David, Hall, Ira, Zook, Justin M., Eichler, Evan E., Phillippy, Adam M., Paten, Benedict, Howe, Kerstin, Miga, Karen H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9668749/
https://www.ncbi.nlm.nih.gov/pubmed/36261518
http://dx.doi.org/10.1038/s41586-022-05325-5
_version_ 1784831981451739136
author Jarvis, Erich D.
Formenti, Giulio
Rhie, Arang
Guarracino, Andrea
Yang, Chentao
Wood, Jonathan
Tracey, Alan
Thibaud-Nissen, Francoise
Vollger, Mitchell R.
Porubsky, David
Cheng, Haoyu
Asri, Mobin
Logsdon, Glennis A.
Carnevali, Paolo
Chaisson, Mark J. P.
Chin, Chen-Shan
Cody, Sarah
Collins, Joanna
Ebert, Peter
Escalona, Merly
Fedrigo, Olivier
Fulton, Robert S.
Fulton, Lucinda L.
Garg, Shilpa
Gerton, Jennifer L.
Ghurye, Jay
Granat, Anastasiya
Green, Richard E.
Harvey, William
Hasenfeld, Patrick
Hastie, Alex
Haukness, Marina
Jaeger, Erich B.
Jain, Miten
Kirsche, Melanie
Kolmogorov, Mikhail
Korbel, Jan O.
Koren, Sergey
Korlach, Jonas
Lee, Joyce
Li, Daofeng
Lindsay, Tina
Lucas, Julian
Luo, Feng
Marschall, Tobias
Mitchell, Matthew W.
McDaniel, Jennifer
Nie, Fan
Olsen, Hugh E.
Olson, Nathan D.
Pesout, Trevor
Potapova, Tamara
Puiu, Daniela
Regier, Allison
Ruan, Jue
Salzberg, Steven L.
Sanders, Ashley D.
Schatz, Michael C.
Schmitt, Anthony
Schneider, Valerie A.
Selvaraj, Siddarth
Shafin, Kishwar
Shumate, Alaina
Stitziel, Nathan O.
Stober, Catherine
Torrance, James
Wagner, Justin
Wang, Jianxin
Wenger, Aaron
Xiao, Chuanle
Zimin, Aleksey V.
Zhang, Guojie
Wang, Ting
Li, Heng
Garrison, Erik
Haussler, David
Hall, Ira
Zook, Justin M.
Eichler, Evan E.
Phillippy, Adam M.
Paten, Benedict
Howe, Kerstin
Miga, Karen H.
author_facet Jarvis, Erich D.
Formenti, Giulio
Rhie, Arang
Guarracino, Andrea
Yang, Chentao
Wood, Jonathan
Tracey, Alan
Thibaud-Nissen, Francoise
Vollger, Mitchell R.
Porubsky, David
Cheng, Haoyu
Asri, Mobin
Logsdon, Glennis A.
Carnevali, Paolo
Chaisson, Mark J. P.
Chin, Chen-Shan
Cody, Sarah
Collins, Joanna
Ebert, Peter
Escalona, Merly
Fedrigo, Olivier
Fulton, Robert S.
Fulton, Lucinda L.
Garg, Shilpa
Gerton, Jennifer L.
Ghurye, Jay
Granat, Anastasiya
Green, Richard E.
Harvey, William
Hasenfeld, Patrick
Hastie, Alex
Haukness, Marina
Jaeger, Erich B.
Jain, Miten
Kirsche, Melanie
Kolmogorov, Mikhail
Korbel, Jan O.
Koren, Sergey
Korlach, Jonas
Lee, Joyce
Li, Daofeng
Lindsay, Tina
Lucas, Julian
Luo, Feng
Marschall, Tobias
Mitchell, Matthew W.
McDaniel, Jennifer
Nie, Fan
Olsen, Hugh E.
Olson, Nathan D.
Pesout, Trevor
Potapova, Tamara
Puiu, Daniela
Regier, Allison
Ruan, Jue
Salzberg, Steven L.
Sanders, Ashley D.
Schatz, Michael C.
Schmitt, Anthony
Schneider, Valerie A.
Selvaraj, Siddarth
Shafin, Kishwar
Shumate, Alaina
Stitziel, Nathan O.
Stober, Catherine
Torrance, James
Wagner, Justin
Wang, Jianxin
Wenger, Aaron
Xiao, Chuanle
Zimin, Aleksey V.
Zhang, Guojie
Wang, Ting
Li, Heng
Garrison, Erik
Haussler, David
Hall, Ira
Zook, Justin M.
Eichler, Evan E.
Phillippy, Adam M.
Paten, Benedict
Howe, Kerstin
Miga, Karen H.
author_sort Jarvis, Erich D.
collection PubMed
description The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome(5). To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity(6). Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
format Online
Article
Text
id pubmed-9668749
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96687492022-11-18 Semi-automated assembly of high-quality diploid human reference genomes Jarvis, Erich D. Formenti, Giulio Rhie, Arang Guarracino, Andrea Yang, Chentao Wood, Jonathan Tracey, Alan Thibaud-Nissen, Francoise Vollger, Mitchell R. Porubsky, David Cheng, Haoyu Asri, Mobin Logsdon, Glennis A. Carnevali, Paolo Chaisson, Mark J. P. Chin, Chen-Shan Cody, Sarah Collins, Joanna Ebert, Peter Escalona, Merly Fedrigo, Olivier Fulton, Robert S. Fulton, Lucinda L. Garg, Shilpa Gerton, Jennifer L. Ghurye, Jay Granat, Anastasiya Green, Richard E. Harvey, William Hasenfeld, Patrick Hastie, Alex Haukness, Marina Jaeger, Erich B. Jain, Miten Kirsche, Melanie Kolmogorov, Mikhail Korbel, Jan O. Koren, Sergey Korlach, Jonas Lee, Joyce Li, Daofeng Lindsay, Tina Lucas, Julian Luo, Feng Marschall, Tobias Mitchell, Matthew W. McDaniel, Jennifer Nie, Fan Olsen, Hugh E. Olson, Nathan D. Pesout, Trevor Potapova, Tamara Puiu, Daniela Regier, Allison Ruan, Jue Salzberg, Steven L. Sanders, Ashley D. Schatz, Michael C. Schmitt, Anthony Schneider, Valerie A. Selvaraj, Siddarth Shafin, Kishwar Shumate, Alaina Stitziel, Nathan O. Stober, Catherine Torrance, James Wagner, Justin Wang, Jianxin Wenger, Aaron Xiao, Chuanle Zimin, Aleksey V. Zhang, Guojie Wang, Ting Li, Heng Garrison, Erik Haussler, David Hall, Ira Zook, Justin M. Eichler, Evan E. Phillippy, Adam M. Paten, Benedict Howe, Kerstin Miga, Karen H. Nature Article The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society(1,2). However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals(3,4). Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome(5). To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity(6). Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements. Nature Publishing Group UK 2022-10-19 2022 /pmc/articles/PMC9668749/ /pubmed/36261518 http://dx.doi.org/10.1038/s41586-022-05325-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Jarvis, Erich D.
Formenti, Giulio
Rhie, Arang
Guarracino, Andrea
Yang, Chentao
Wood, Jonathan
Tracey, Alan
Thibaud-Nissen, Francoise
Vollger, Mitchell R.
Porubsky, David
Cheng, Haoyu
Asri, Mobin
Logsdon, Glennis A.
Carnevali, Paolo
Chaisson, Mark J. P.
Chin, Chen-Shan
Cody, Sarah
Collins, Joanna
Ebert, Peter
Escalona, Merly
Fedrigo, Olivier
Fulton, Robert S.
Fulton, Lucinda L.
Garg, Shilpa
Gerton, Jennifer L.
Ghurye, Jay
Granat, Anastasiya
Green, Richard E.
Harvey, William
Hasenfeld, Patrick
Hastie, Alex
Haukness, Marina
Jaeger, Erich B.
Jain, Miten
Kirsche, Melanie
Kolmogorov, Mikhail
Korbel, Jan O.
Koren, Sergey
Korlach, Jonas
Lee, Joyce
Li, Daofeng
Lindsay, Tina
Lucas, Julian
Luo, Feng
Marschall, Tobias
Mitchell, Matthew W.
McDaniel, Jennifer
Nie, Fan
Olsen, Hugh E.
Olson, Nathan D.
Pesout, Trevor
Potapova, Tamara
Puiu, Daniela
Regier, Allison
Ruan, Jue
Salzberg, Steven L.
Sanders, Ashley D.
Schatz, Michael C.
Schmitt, Anthony
Schneider, Valerie A.
Selvaraj, Siddarth
Shafin, Kishwar
Shumate, Alaina
Stitziel, Nathan O.
Stober, Catherine
Torrance, James
Wagner, Justin
Wang, Jianxin
Wenger, Aaron
Xiao, Chuanle
Zimin, Aleksey V.
Zhang, Guojie
Wang, Ting
Li, Heng
Garrison, Erik
Haussler, David
Hall, Ira
Zook, Justin M.
Eichler, Evan E.
Phillippy, Adam M.
Paten, Benedict
Howe, Kerstin
Miga, Karen H.
Semi-automated assembly of high-quality diploid human reference genomes
title Semi-automated assembly of high-quality diploid human reference genomes
title_full Semi-automated assembly of high-quality diploid human reference genomes
title_fullStr Semi-automated assembly of high-quality diploid human reference genomes
title_full_unstemmed Semi-automated assembly of high-quality diploid human reference genomes
title_short Semi-automated assembly of high-quality diploid human reference genomes
title_sort semi-automated assembly of high-quality diploid human reference genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9668749/
https://www.ncbi.nlm.nih.gov/pubmed/36261518
http://dx.doi.org/10.1038/s41586-022-05325-5
work_keys_str_mv AT jarviserichd semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT formentigiulio semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT rhiearang semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT guarracinoandrea semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT yangchentao semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT woodjonathan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT traceyalan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT thibaudnissenfrancoise semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT vollgermitchellr semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT porubskydavid semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT chenghaoyu semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT asrimobin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT logsdonglennisa semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT carnevalipaolo semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT chaissonmarkjp semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT chinchenshan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT codysarah semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT collinsjoanna semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT ebertpeter semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT escalonamerly semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT fedrigoolivier semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT fultonroberts semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT fultonlucindal semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT gargshilpa semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT gertonjenniferl semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT ghuryejay semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT granatanastasiya semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT greenricharde semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT harveywilliam semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT hasenfeldpatrick semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT hastiealex semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT hauknessmarina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT jaegererichb semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT jainmiten semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT kirschemelanie semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT kolmogorovmikhail semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT korbeljano semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT korensergey semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT korlachjonas semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT leejoyce semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT lidaofeng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT lindsaytina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT lucasjulian semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT luofeng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT marschalltobias semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT mitchellmattheww semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT mcdanieljennifer semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT niefan semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT olsenhughe semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT olsonnathand semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT pesouttrevor semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT potapovatamara semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT puiudaniela semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT regierallison semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT ruanjue semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT salzbergstevenl semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT sandersashleyd semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT schatzmichaelc semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT schmittanthony semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT schneidervaleriea semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT selvarajsiddarth semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT shafinkishwar semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT shumatealaina semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT stitzielnathano semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT stobercatherine semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT torrancejames semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT wagnerjustin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT wangjianxin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT wengeraaron semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT xiaochuanle semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT ziminalekseyv semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT zhangguojie semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT wangting semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT liheng semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT garrisonerik semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT hausslerdavid semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT hallira semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT zookjustinm semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT eichlerevane semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT phillippyadamm semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT patenbenedict semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT howekerstin semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT migakarenh semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes
AT semiautomatedassemblyofhighqualitydiploidhumanreferencegenomes