Cargando…

A graph-based algorithm for RNA-seq data normalization

The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has b...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Diem-Trang, Bhaskara, Aditya, Kuberan, Balagurunathan, Might, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980396/
https://www.ncbi.nlm.nih.gov/pubmed/31978105
http://dx.doi.org/10.1371/journal.pone.0227760
_version_ 1783490946608922624
author Tran, Diem-Trang
Bhaskara, Aditya
Kuberan, Balagurunathan
Might, Matthew
author_facet Tran, Diem-Trang
Bhaskara, Aditya
Kuberan, Balagurunathan
Might, Matthew
author_sort Tran, Diem-Trang
collection PubMed
description The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.
format Online
Article
Text
id pubmed-6980396
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69803962020-02-04 A graph-based algorithm for RNA-seq data normalization Tran, Diem-Trang Bhaskara, Aditya Kuberan, Balagurunathan Might, Matthew PLoS One Research Article The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization. Public Library of Science 2020-01-24 /pmc/articles/PMC6980396/ /pubmed/31978105 http://dx.doi.org/10.1371/journal.pone.0227760 Text en © 2020 Tran et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tran, Diem-Trang
Bhaskara, Aditya
Kuberan, Balagurunathan
Might, Matthew
A graph-based algorithm for RNA-seq data normalization
title A graph-based algorithm for RNA-seq data normalization
title_full A graph-based algorithm for RNA-seq data normalization
title_fullStr A graph-based algorithm for RNA-seq data normalization
title_full_unstemmed A graph-based algorithm for RNA-seq data normalization
title_short A graph-based algorithm for RNA-seq data normalization
title_sort graph-based algorithm for rna-seq data normalization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6980396/
https://www.ncbi.nlm.nih.gov/pubmed/31978105
http://dx.doi.org/10.1371/journal.pone.0227760
work_keys_str_mv AT trandiemtrang agraphbasedalgorithmforrnaseqdatanormalization
AT bhaskaraaditya agraphbasedalgorithmforrnaseqdatanormalization
AT kuberanbalagurunathan agraphbasedalgorithmforrnaseqdatanormalization
AT mightmatthew agraphbasedalgorithmforrnaseqdatanormalization
AT trandiemtrang graphbasedalgorithmforrnaseqdatanormalization
AT bhaskaraaditya graphbasedalgorithmforrnaseqdatanormalization
AT kuberanbalagurunathan graphbasedalgorithmforrnaseqdatanormalization
AT mightmatthew graphbasedalgorithmforrnaseqdatanormalization