Cargando…
Differential correlation for sequencing data
BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244536/ https://www.ncbi.nlm.nih.gov/pubmed/28103954 http://dx.doi.org/10.1186/s13104-016-2331-9 |
_version_ | 1782496711817560064 |
---|---|
author | Siska, Charlotte Kechris, Katerina |
author_facet | Siska, Charlotte Kechris, Katerina |
author_sort | Siska, Charlotte |
collection | PubMed |
description | BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. RESULTS: With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman’s correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman’s correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. CONCLUSIONS: A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman’s correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple –omics studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2331-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5244536 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52445362017-01-23 Differential correlation for sequencing data Siska, Charlotte Kechris, Katerina BMC Res Notes Technical Note BACKGROUND: Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from –omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. RESULTS: With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman’s correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman’s correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. CONCLUSIONS: A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman’s correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple –omics studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2331-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-19 /pmc/articles/PMC5244536/ /pubmed/28103954 http://dx.doi.org/10.1186/s13104-016-2331-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Siska, Charlotte Kechris, Katerina Differential correlation for sequencing data |
title | Differential correlation for sequencing data |
title_full | Differential correlation for sequencing data |
title_fullStr | Differential correlation for sequencing data |
title_full_unstemmed | Differential correlation for sequencing data |
title_short | Differential correlation for sequencing data |
title_sort | differential correlation for sequencing data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244536/ https://www.ncbi.nlm.nih.gov/pubmed/28103954 http://dx.doi.org/10.1186/s13104-016-2331-9 |
work_keys_str_mv | AT siskacharlotte differentialcorrelationforsequencingdata AT kechriskaterina differentialcorrelationforsequencingdata |