Cargando…

Recovery of non-reference sequences missing from the human reference genome

BACKGROUND: The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. RESULTS: Here, we compared 31 human de novo...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ran, Tian, Xiaomeng, Yang, Peng, Fan, Yingzhi, Li, Ming, Zheng, Hongxiang, Wang, Xihong, Jiang, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6796347/
https://www.ncbi.nlm.nih.gov/pubmed/31619167
http://dx.doi.org/10.1186/s12864-019-6107-1
_version_ 1783459565222756352
author Li, Ran
Tian, Xiaomeng
Yang, Peng
Fan, Yingzhi
Li, Ming
Zheng, Hongxiang
Wang, Xihong
Jiang, Yu
author_facet Li, Ran
Tian, Xiaomeng
Yang, Peng
Fan, Yingzhi
Li, Ming
Zheng, Hongxiang
Wang, Xihong
Jiang, Yu
author_sort Li, Ran
collection PubMed
description BACKGROUND: The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. RESULTS: Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. CONCLUSIONS: Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.
format Online
Article
Text
id pubmed-6796347
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67963472019-10-21 Recovery of non-reference sequences missing from the human reference genome Li, Ran Tian, Xiaomeng Yang, Peng Fan, Yingzhi Li, Ming Zheng, Hongxiang Wang, Xihong Jiang, Yu BMC Genomics Research Article BACKGROUND: The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. RESULTS: Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. CONCLUSIONS: Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome. BioMed Central 2019-10-16 /pmc/articles/PMC6796347/ /pubmed/31619167 http://dx.doi.org/10.1186/s12864-019-6107-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Ran
Tian, Xiaomeng
Yang, Peng
Fan, Yingzhi
Li, Ming
Zheng, Hongxiang
Wang, Xihong
Jiang, Yu
Recovery of non-reference sequences missing from the human reference genome
title Recovery of non-reference sequences missing from the human reference genome
title_full Recovery of non-reference sequences missing from the human reference genome
title_fullStr Recovery of non-reference sequences missing from the human reference genome
title_full_unstemmed Recovery of non-reference sequences missing from the human reference genome
title_short Recovery of non-reference sequences missing from the human reference genome
title_sort recovery of non-reference sequences missing from the human reference genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6796347/
https://www.ncbi.nlm.nih.gov/pubmed/31619167
http://dx.doi.org/10.1186/s12864-019-6107-1
work_keys_str_mv AT liran recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT tianxiaomeng recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT yangpeng recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT fanyingzhi recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT liming recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT zhenghongxiang recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT wangxihong recoveryofnonreferencesequencesmissingfromthehumanreferencegenome
AT jiangyu recoveryofnonreferencesequencesmissingfromthehumanreferencegenome