Cargando…

Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum

BACKGROUND: Plasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Xueqing Maggie, Bunnik, Evelien M., Pokhriyal, Neeti, Nasseri, Sara, Lonardi, Stefano, Le Roch, Karine G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658763/
https://www.ncbi.nlm.nih.gov/pubmed/26607328
http://dx.doi.org/10.1186/s12864-015-2214-9
_version_ 1782402563726901248
author Lu, Xueqing Maggie
Bunnik, Evelien M.
Pokhriyal, Neeti
Nasseri, Sara
Lonardi, Stefano
Le Roch, Karine G.
author_facet Lu, Xueqing Maggie
Bunnik, Evelien M.
Pokhriyal, Neeti
Nasseri, Sara
Lonardi, Stefano
Le Roch, Karine G.
author_sort Lu, Xueqing Maggie
collection PubMed
description BACKGROUND: Plasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5’- and 3’-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum. RESULTS: Using binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47 %) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6 %). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated. CONCLUSION: Our results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2214-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4658763
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46587632015-11-26 Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum Lu, Xueqing Maggie Bunnik, Evelien M. Pokhriyal, Neeti Nasseri, Sara Lonardi, Stefano Le Roch, Karine G. BMC Genomics Methodology Article BACKGROUND: Plasmodium falciparum, the deadliest malaria-causing parasite, has an extremely AT-rich (80.7 %) genome. Because of high AT-content, sequence-based annotation of genes and functional elements remains challenging. In order to better understand the regulatory network controlling gene expression in the parasite, a more complete genome annotation as well as analysis tools adapted for AT-rich genomes are needed. Recent studies on genome-wide nucleosome positioning in eukaryotes have shown that nucleosome landscapes exhibit regular characteristic patterns at the 5’- and 3’-end of protein and non-protein coding genes. In addition, nucleosome depleted regions can be found near transcription start sites. These unique nucleosome landscape patterns may be exploited for the identification of novel genes. In this paper, we propose a computational approach to discover novel putative genes based exclusively on nucleosome positioning data in the AT-rich genome of P. falciparum. RESULTS: Using binary classifiers trained on nucleosome landscapes at the gene boundaries from two independent nucleosome positioning data sets, we were able to detect a total of 231 regions containing putative genes in the genome of Plasmodium falciparum, of which 67 highly confident genes were found in both data sets. Eighty-eight of these 231 newly predicted genes exhibited transcription signal in RNA-Seq data, indicative of active transcription. In addition, 20 out of 21 selected gene candidates were further validated by RT-PCR, and 28 out of the 231 genes showed significant matches using BLASTN against an expressed sequence tag (EST) database. Furthermore, 108 (47 %) out of the 231 putative novel genes overlapped with previously identified but unannotated long non-coding RNAs. Collectively, these results provide experimental validation for 163 predicted genes (70.6 %). Finally, 73 out of 231 genes were found to be potentially translated based on their signal in polysome-associated RNA-Seq representing transcripts that are actively being translated. CONCLUSION: Our results clearly indicate that nucleosome positioning data contains sufficient information for novel gene discovery. As distinct nucleosome landscapes around genes are found in many other eukaryotic organisms, this methodology could be used to characterize the transcriptome of any organism, especially when coupled with other DNA-based gene finding and experimental methods (e.g., RNA-Seq). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2214-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-25 /pmc/articles/PMC4658763/ /pubmed/26607328 http://dx.doi.org/10.1186/s12864-015-2214-9 Text en © Lu et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lu, Xueqing Maggie
Bunnik, Evelien M.
Pokhriyal, Neeti
Nasseri, Sara
Lonardi, Stefano
Le Roch, Karine G.
Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title_full Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title_fullStr Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title_full_unstemmed Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title_short Analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite Plasmodium falciparum
title_sort analysis of nucleosome positioning landscapes enables gene discovery in the human malaria parasite plasmodium falciparum
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658763/
https://www.ncbi.nlm.nih.gov/pubmed/26607328
http://dx.doi.org/10.1186/s12864-015-2214-9
work_keys_str_mv AT luxueqingmaggie analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum
AT bunnikevelienm analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum
AT pokhriyalneeti analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum
AT nasserisara analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum
AT lonardistefano analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum
AT lerochkarineg analysisofnucleosomepositioninglandscapesenablesgenediscoveryinthehumanmalariaparasiteplasmodiumfalciparum