Cargando…

Differential correlation for sequencing data

BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of...

Descripción completa

Detalles Bibliográficos
Autores principales: Siska, Charlotte, Kechris, Katerina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244536/
https://www.ncbi.nlm.nih.gov/pubmed/28103954
http://dx.doi.org/10.1186/s13104-016-2331-9
_version_ 1782496711817560064
author Siska, Charlotte
Kechris, Katerina
author_facet Siska, Charlotte
Kechris, Katerina
author_sort Siska, Charlotte
collection PubMed
description BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. RESULTS: With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman’s correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman’s correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. CONCLUSIONS: A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman’s correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple –omics studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2331-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5244536
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52445362017-01-23 Differential correlation for sequencing data Siska, Charlotte Kechris, Katerina BMC Res Notes Technical Note BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. RESULTS: With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman’s correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman’s correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. CONCLUSIONS: A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman’s correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple –omics studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2331-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-19 /pmc/articles/PMC5244536/ /pubmed/28103954 http://dx.doi.org/10.1186/s13104-016-2331-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Siska, Charlotte
Kechris, Katerina
Differential correlation for sequencing data
title Differential correlation for sequencing data
title_full Differential correlation for sequencing data
title_fullStr Differential correlation for sequencing data
title_full_unstemmed Differential correlation for sequencing data
title_short Differential correlation for sequencing data
title_sort differential correlation for sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244536/
https://www.ncbi.nlm.nih.gov/pubmed/28103954
http://dx.doi.org/10.1186/s13104-016-2331-9
work_keys_str_mv AT siskacharlotte differentialcorrelationforsequencingdata
AT kechriskaterina differentialcorrelationforsequencingdata