Cargando…

Towards a reference genome that captures global genetic diversity

The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. Here, we analyze 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome w...

Descripción completa

Detalles Bibliográficos
Autores principales: Wong, Karen H. Y., Ma, Walfred, Wei, Chun-Yu, Yeh, Erh-Chan, Lin, Wan-Jia, Wang, Elin H. F., Su, Jen-Ping, Hsieh, Feng-Jen, Kao, Hsiao-Jung, Chen, Hsiao-Huei, Chow, Stephen K., Young, Eleanor, Chu, Catherine, Poon, Annie, Yang, Chi-Fan, Lin, Dar-Shong, Hu, Yu-Feng, Wu, Jer-Yuarn, Lee, Ni-Chung, Hwu, Wuh-Liang, Boffelli, Dario, Martin, David, Xiao, Ming, Kwok, Pui-Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7599213/
https://www.ncbi.nlm.nih.gov/pubmed/33127893
http://dx.doi.org/10.1038/s41467-020-19311-w
_version_ 1783602824048803840
author Wong, Karen H. Y.
Ma, Walfred
Wei, Chun-Yu
Yeh, Erh-Chan
Lin, Wan-Jia
Wang, Elin H. F.
Su, Jen-Ping
Hsieh, Feng-Jen
Kao, Hsiao-Jung
Chen, Hsiao-Huei
Chow, Stephen K.
Young, Eleanor
Chu, Catherine
Poon, Annie
Yang, Chi-Fan
Lin, Dar-Shong
Hu, Yu-Feng
Wu, Jer-Yuarn
Lee, Ni-Chung
Hwu, Wuh-Liang
Boffelli, Dario
Martin, David
Xiao, Ming
Kwok, Pui-Yan
author_facet Wong, Karen H. Y.
Ma, Walfred
Wei, Chun-Yu
Yeh, Erh-Chan
Lin, Wan-Jia
Wang, Elin H. F.
Su, Jen-Ping
Hsieh, Feng-Jen
Kao, Hsiao-Jung
Chen, Hsiao-Huei
Chow, Stephen K.
Young, Eleanor
Chu, Catherine
Poon, Annie
Yang, Chi-Fan
Lin, Dar-Shong
Hu, Yu-Feng
Wu, Jer-Yuarn
Lee, Ni-Chung
Hwu, Wuh-Liang
Boffelli, Dario
Martin, David
Xiao, Ming
Kwok, Pui-Yan
author_sort Wong, Karen H. Y.
collection PubMed
description The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. Here, we analyze 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution. We identify 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupt exons and known regulatory elements. To improve genome annotations, we linearly integrate these sequences into the chromosomal assemblies and construct a Human Diversity Reference. Leveraging this reference, an average of 402,573 previously unmapped reads can be recovered for a given genome sequenced to ~40X coverage. Transcriptomic diversity among these non-reference sequences can also be directly assessed. We successfully map tens of thousands of previously discarded RNA-Seq reads to this reference and identify transcription evidence in 4781 gene loci, underlining the importance of these non-reference sequences in functional genomics. Our extensive datasets are important advances toward a comprehensive reference representation of global human genetic diversity.
format Online
Article
Text
id pubmed-7599213
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-75992132020-11-10 Towards a reference genome that captures global genetic diversity Wong, Karen H. Y. Ma, Walfred Wei, Chun-Yu Yeh, Erh-Chan Lin, Wan-Jia Wang, Elin H. F. Su, Jen-Ping Hsieh, Feng-Jen Kao, Hsiao-Jung Chen, Hsiao-Huei Chow, Stephen K. Young, Eleanor Chu, Catherine Poon, Annie Yang, Chi-Fan Lin, Dar-Shong Hu, Yu-Feng Wu, Jer-Yuarn Lee, Ni-Chung Hwu, Wuh-Liang Boffelli, Dario Martin, David Xiao, Ming Kwok, Pui-Yan Nat Commun Article The current human reference genome is predominantly derived from a single individual and it does not adequately reflect human genetic diversity. Here, we analyze 338 high-quality human assemblies of genetically divergent human populations to identify missing sequences in the human reference genome with breakpoint resolution. We identify 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupt exons and known regulatory elements. To improve genome annotations, we linearly integrate these sequences into the chromosomal assemblies and construct a Human Diversity Reference. Leveraging this reference, an average of 402,573 previously unmapped reads can be recovered for a given genome sequenced to ~40X coverage. Transcriptomic diversity among these non-reference sequences can also be directly assessed. We successfully map tens of thousands of previously discarded RNA-Seq reads to this reference and identify transcription evidence in 4781 gene loci, underlining the importance of these non-reference sequences in functional genomics. Our extensive datasets are important advances toward a comprehensive reference representation of global human genetic diversity. Nature Publishing Group UK 2020-10-30 /pmc/articles/PMC7599213/ /pubmed/33127893 http://dx.doi.org/10.1038/s41467-020-19311-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Wong, Karen H. Y.
Ma, Walfred
Wei, Chun-Yu
Yeh, Erh-Chan
Lin, Wan-Jia
Wang, Elin H. F.
Su, Jen-Ping
Hsieh, Feng-Jen
Kao, Hsiao-Jung
Chen, Hsiao-Huei
Chow, Stephen K.
Young, Eleanor
Chu, Catherine
Poon, Annie
Yang, Chi-Fan
Lin, Dar-Shong
Hu, Yu-Feng
Wu, Jer-Yuarn
Lee, Ni-Chung
Hwu, Wuh-Liang
Boffelli, Dario
Martin, David
Xiao, Ming
Kwok, Pui-Yan
Towards a reference genome that captures global genetic diversity
title Towards a reference genome that captures global genetic diversity
title_full Towards a reference genome that captures global genetic diversity
title_fullStr Towards a reference genome that captures global genetic diversity
title_full_unstemmed Towards a reference genome that captures global genetic diversity
title_short Towards a reference genome that captures global genetic diversity
title_sort towards a reference genome that captures global genetic diversity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7599213/
https://www.ncbi.nlm.nih.gov/pubmed/33127893
http://dx.doi.org/10.1038/s41467-020-19311-w
work_keys_str_mv AT wongkarenhy towardsareferencegenomethatcapturesglobalgeneticdiversity
AT mawalfred towardsareferencegenomethatcapturesglobalgeneticdiversity
AT weichunyu towardsareferencegenomethatcapturesglobalgeneticdiversity
AT yeherhchan towardsareferencegenomethatcapturesglobalgeneticdiversity
AT linwanjia towardsareferencegenomethatcapturesglobalgeneticdiversity
AT wangelinhf towardsareferencegenomethatcapturesglobalgeneticdiversity
AT sujenping towardsareferencegenomethatcapturesglobalgeneticdiversity
AT hsiehfengjen towardsareferencegenomethatcapturesglobalgeneticdiversity
AT kaohsiaojung towardsareferencegenomethatcapturesglobalgeneticdiversity
AT chenhsiaohuei towardsareferencegenomethatcapturesglobalgeneticdiversity
AT chowstephenk towardsareferencegenomethatcapturesglobalgeneticdiversity
AT youngeleanor towardsareferencegenomethatcapturesglobalgeneticdiversity
AT chucatherine towardsareferencegenomethatcapturesglobalgeneticdiversity
AT poonannie towardsareferencegenomethatcapturesglobalgeneticdiversity
AT yangchifan towardsareferencegenomethatcapturesglobalgeneticdiversity
AT lindarshong towardsareferencegenomethatcapturesglobalgeneticdiversity
AT huyufeng towardsareferencegenomethatcapturesglobalgeneticdiversity
AT wujeryuarn towardsareferencegenomethatcapturesglobalgeneticdiversity
AT leenichung towardsareferencegenomethatcapturesglobalgeneticdiversity
AT hwuwuhliang towardsareferencegenomethatcapturesglobalgeneticdiversity
AT boffellidario towardsareferencegenomethatcapturesglobalgeneticdiversity
AT martindavid towardsareferencegenomethatcapturesglobalgeneticdiversity
AT xiaoming towardsareferencegenomethatcapturesglobalgeneticdiversity
AT kwokpuiyan towardsareferencegenomethatcapturesglobalgeneticdiversity