Cargando…

Revealing the missing expressed genes beyond the human reference genome by RNA-Seq

BACKGROUND: The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. RESULTS: we used two RNA-Seq datasets from human brain tissues and 1...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Geng, Li, Ruiyuan, Shi, Leming, Qi, Junyi, Hu, Pengzhan, Luo, Jian, Liu, Mingyao, Shi, Tieliu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288009/
https://www.ncbi.nlm.nih.gov/pubmed/22133125
http://dx.doi.org/10.1186/1471-2164-12-590
_version_ 1782224790093824000
author Chen, Geng
Li, Ruiyuan
Shi, Leming
Qi, Junyi
Hu, Pengzhan
Luo, Jian
Liu, Mingyao
Shi, Tieliu
author_facet Chen, Geng
Li, Ruiyuan
Shi, Leming
Qi, Junyi
Hu, Pengzhan
Luo, Jian
Liu, Mingyao
Shi, Tieliu
author_sort Chen, Geng
collection PubMed
description BACKGROUND: The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. RESULTS: we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. CONCLUSION: Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.
format Online
Article
Text
id pubmed-3288009
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32880092012-02-29 Revealing the missing expressed genes beyond the human reference genome by RNA-Seq Chen, Geng Li, Ruiyuan Shi, Leming Qi, Junyi Hu, Pengzhan Luo, Jian Liu, Mingyao Shi, Tieliu BMC Genomics Research Article BACKGROUND: The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. RESULTS: we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. CONCLUSION: Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome. BioMed Central 2011-12-02 /pmc/articles/PMC3288009/ /pubmed/22133125 http://dx.doi.org/10.1186/1471-2164-12-590 Text en Copyright ©2011 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Geng
Li, Ruiyuan
Shi, Leming
Qi, Junyi
Hu, Pengzhan
Luo, Jian
Liu, Mingyao
Shi, Tieliu
Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title_full Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title_fullStr Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title_full_unstemmed Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title_short Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
title_sort revealing the missing expressed genes beyond the human reference genome by rna-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3288009/
https://www.ncbi.nlm.nih.gov/pubmed/22133125
http://dx.doi.org/10.1186/1471-2164-12-590
work_keys_str_mv AT chengeng revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT liruiyuan revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT shileming revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT qijunyi revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT hupengzhan revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT luojian revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT liumingyao revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq
AT shitieliu revealingthemissingexpressedgenesbeyondthehumanreferencegenomebyrnaseq