Cargando…
Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences
Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407462/ https://www.ncbi.nlm.nih.gov/pubmed/32532800 http://dx.doi.org/10.1534/g3.120.401280 |
_version_ | 1783567626423762944 |
---|---|
author | Zhao, Tingting Duan, Zhongqu Genchev, Georgi Z. Lu, Hui |
author_facet | Zhao, Tingting Duan, Zhongqu Genchev, Georgi Z. Lu, Hui |
author_sort | Zhao, Tingting |
collection | PubMed |
description | Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity. |
format | Online Article Text |
id | pubmed-7407462 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-74074622020-08-19 Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences Zhao, Tingting Duan, Zhongqu Genchev, Georgi Z. Lu, Hui G3 (Bethesda) Investigations Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity. Genetics Society of America 2020-06-12 /pmc/articles/PMC7407462/ /pubmed/32532800 http://dx.doi.org/10.1534/g3.120.401280 Text en Copyright © 2020 Zhao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Zhao, Tingting Duan, Zhongqu Genchev, Georgi Z. Lu, Hui Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title | Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title_full | Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title_fullStr | Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title_full_unstemmed | Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title_short | Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences |
title_sort | closing human reference genome gaps: identifying and characterizing gap-closing sequences |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7407462/ https://www.ncbi.nlm.nih.gov/pubmed/32532800 http://dx.doi.org/10.1534/g3.120.401280 |
work_keys_str_mv | AT zhaotingting closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences AT duanzhongqu closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences AT genchevgeorgiz closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences AT luhui closinghumanreferencegenomegapsidentifyingandcharacterizinggapclosingsequences |