Cargando…
SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data
BACKGROUND: Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large f...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6798423/ https://www.ncbi.nlm.nih.gov/pubmed/31623550 http://dx.doi.org/10.1186/s12859-019-3067-z |
_version_ | 1783460035739779072 |
---|---|
author | Li, Yuntong Fan, Teresa W.M. Lane, Andrew N. Kang, Woo-Young Arnold, Susanne M. Stromberg, Arnold J. Wang, Chi Chen, Li |
author_facet | Li, Yuntong Fan, Teresa W.M. Lane, Andrew N. Kang, Woo-Young Arnold, Susanne M. Stromberg, Arnold J. Wang, Chi Chen, Li |
author_sort | Li, Yuntong |
collection | PubMed |
description | BACKGROUND: Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient. RESULTS: We propose a new semi-parametric differential abundance analysis (SDA) method for metabolomics and proteomics data from MS. The method considers a two-part model, a logistic regression for the zero proportion and a semi-parametric log-linear model for the possibly non-normally distributed non-zero values, to characterize data from each feature. A kernel-smoothed likelihood method is developed to estimate model coefficients and a likelihood ratio test is constructed for differential abundant analysis. The method has been implemented into an R package, SDAMS, which is available at https://www.bioconductor.org/packages/release/bioc/html/SDAMS.html. CONCLUSION: By introducing the two-part semi-parametric model, SDA is able to handle both non-normally distributed data and large fraction of zero values in a MS dataset. It also allows for adjustment of covariates. Simulations and real data analyses demonstrate that SDA outperforms existing methods. |
format | Online Article Text |
id | pubmed-6798423 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67984232019-10-21 SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data Li, Yuntong Fan, Teresa W.M. Lane, Andrew N. Kang, Woo-Young Arnold, Susanne M. Stromberg, Arnold J. Wang, Chi Chen, Li BMC Bioinformatics Methodology Article BACKGROUND: Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient. RESULTS: We propose a new semi-parametric differential abundance analysis (SDA) method for metabolomics and proteomics data from MS. The method considers a two-part model, a logistic regression for the zero proportion and a semi-parametric log-linear model for the possibly non-normally distributed non-zero values, to characterize data from each feature. A kernel-smoothed likelihood method is developed to estimate model coefficients and a likelihood ratio test is constructed for differential abundant analysis. The method has been implemented into an R package, SDAMS, which is available at https://www.bioconductor.org/packages/release/bioc/html/SDAMS.html. CONCLUSION: By introducing the two-part semi-parametric model, SDA is able to handle both non-normally distributed data and large fraction of zero values in a MS dataset. It also allows for adjustment of covariates. Simulations and real data analyses demonstrate that SDA outperforms existing methods. BioMed Central 2019-10-17 /pmc/articles/PMC6798423/ /pubmed/31623550 http://dx.doi.org/10.1186/s12859-019-3067-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Li, Yuntong Fan, Teresa W.M. Lane, Andrew N. Kang, Woo-Young Arnold, Susanne M. Stromberg, Arnold J. Wang, Chi Chen, Li SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title | SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title_full | SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title_fullStr | SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title_full_unstemmed | SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title_short | SDA: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
title_sort | sda: a semi-parametric differential abundance analysis method for metabolomics and proteomics data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6798423/ https://www.ncbi.nlm.nih.gov/pubmed/31623550 http://dx.doi.org/10.1186/s12859-019-3067-z |
work_keys_str_mv | AT liyuntong sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT fanteresawm sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT laneandrewn sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT kangwooyoung sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT arnoldsusannem sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT strombergarnoldj sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT wangchi sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata AT chenli sdaasemiparametricdifferentialabundanceanalysismethodformetabolomicsandproteomicsdata |