Cargando…

Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Tingting, Duan, Zhongqu, Genchev, Georgi Z., Lu, Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407462/
https://www.ncbi.nlm.nih.gov/pubmed/32532800
http://dx.doi.org/10.1534/g3.120.401280
_version_ 1783567626423762944
author Zhao, Tingting
Duan, Zhongqu
Genchev, Georgi Z.
Lu, Hui
author_facet Zhao, Tingting
Duan, Zhongqu
Genchev, Georgi Z.
Lu, Hui
author_sort Zhao, Tingting
collection PubMed
description Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.
format Online
Article
Text
id pubmed-7407462
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-74074622020-08-19 Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences Zhao, Tingting Duan, Zhongqu Genchev, Georgi Z. Lu, Hui G3 (Bethesda) Investigations Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity. Genetics Society of America 2020-06-12 /pmc/articles/PMC7407462/ /pubmed/32532800 http://dx.doi.org/10.1534/g3.120.401280 Text en Copyright © 2020 Zhao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Zhao, Tingting
Duan, Zhongqu
Genchev, Georgi Z.
Lu, Hui
Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title_full Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title_fullStr Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title_full_unstemmed Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title_short Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
title_sort closing human reference genome gaps: identifying and characterizing gap-closing sequences
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407462/
https://www.ncbi.nlm.nih.gov/pubmed/32532800
http://dx.doi.org/10.1534/g3.120.401280
work_keys_str_mv AT zhaotingting closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences
AT duanzhongqu closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences
AT genchevgeorgiz closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences
AT luhui closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences