Cargando…

FDM: a graph-based statistical method to detect differential transcription using RNA-seq data

Motivation: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription bet...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Darshan, Orellana, Christian F., Hu, Yin, Jones, Corbin D., Liu, Yufeng, Chiang, Derek Y., Liu, Jinze, Prins, Jan F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179659/
https://www.ncbi.nlm.nih.gov/pubmed/21824971
http://dx.doi.org/10.1093/bioinformatics/btr458
_version_ 1782212539167277056
author Singh, Darshan
Orellana, Christian F.
Hu, Yin
Jones, Corbin D.
Liu, Yufeng
Chiang, Derek Y.
Liu, Jinze
Prins, Jan F.
author_facet Singh, Darshan
Orellana, Christian F.
Hu, Yin
Jones, Corbin D.
Liu, Yufeng
Chiang, Derek Y.
Liu, Jinze
Prins, Jan F.
author_sort Singh, Darshan
collection PubMed
description Motivation: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). Methods: We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. Results: Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed. Using experimental data consisting of four replicates each for two cancer cell lines (MCF7 and SUM102), FDM identified 1425 genes as significantly different in transcription. Subsequent study of the samples using quantitative real time polymerase chain reaction (qRT-PCR) of several differential transcription sites identified by FDM, confirmed significant differences at these sites. Availability: http://csbio-linux001.cs.unc.edu/nextgen/software/FDM Contact: darshan@email.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3179659
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31796592011-09-26 FDM: a graph-based statistical method to detect differential transcription using RNA-seq data Singh, Darshan Orellana, Christian F. Hu, Yin Jones, Corbin D. Liu, Yufeng Chiang, Derek Y. Liu, Jinze Prins, Jan F. Bioinformatics Original Papers Motivation: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). Methods: We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. Results: Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed. Using experimental data consisting of four replicates each for two cancer cell lines (MCF7 and SUM102), FDM identified 1425 genes as significantly different in transcription. Subsequent study of the samples using quantitative real time polymerase chain reaction (qRT-PCR) of several differential transcription sites identified by FDM, confirmed significant differences at these sites. Availability: http://csbio-linux001.cs.unc.edu/nextgen/software/FDM Contact: darshan@email.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-10-01 2011-08-08 /pmc/articles/PMC3179659/ /pubmed/21824971 http://dx.doi.org/10.1093/bioinformatics/btr458 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Singh, Darshan
Orellana, Christian F.
Hu, Yin
Jones, Corbin D.
Liu, Yufeng
Chiang, Derek Y.
Liu, Jinze
Prins, Jan F.
FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title_full FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title_fullStr FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title_full_unstemmed FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title_short FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
title_sort fdm: a graph-based statistical method to detect differential transcription using rna-seq data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3179659/
https://www.ncbi.nlm.nih.gov/pubmed/21824971
http://dx.doi.org/10.1093/bioinformatics/btr458
work_keys_str_mv AT singhdarshan fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT orellanachristianf fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT huyin fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT jonescorbind fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT liuyufeng fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT chiangdereky fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT liujinze fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata
AT prinsjanf fdmagraphbasedstatisticalmethodtodetectdifferentialtranscriptionusingrnaseqdata