Cargando…

Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease

Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtain...

Descripción completa

Detalles Bibliográficos
Autores principales: Fuady, Angga M., van Roon‐Mom, Willeke M. C., Kiełbasa, Szymon M., Uh, Hae‐Won, Houwing‐Duistermaat, Jeanine J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049011/
https://www.ncbi.nlm.nih.gov/pubmed/33350510
http://dx.doi.org/10.1002/bimj.201900235
_version_ 1783679346345508864
author Fuady, Angga M.
van Roon‐Mom, Willeke M. C.
Kiełbasa, Szymon M.
Uh, Hae‐Won
Houwing‐Duistermaat, Jeanine J.
author_facet Fuady, Angga M.
van Roon‐Mom, Willeke M. C.
Kiełbasa, Szymon M.
Uh, Hae‐Won
Houwing‐Duistermaat, Jeanine J.
author_sort Fuady, Angga M.
collection PubMed
description Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtained at different time points can be caused by technical differences. Modeling the two measurements jointly over time might provide insight into the causes of these different results. Our work is motivated by a study of gene expression data of blood samples from Huntington disease patients, which were obtained using two different sequencing technologies. At time point 1, DeepSAGE technology was used to measure the gene expression, with a subsample also measured using RNA‐Seq technology. At time point 2, all samples were measured using RNA‐Seq technology. Significant associations between gene expression measured by DeepSAGE and disease severity using data from the first time point could not be replicated by the RNA‐Seq data from the second time point. We modeled the relationship between the two sequencing technologies using the data from the overlapping samples. We used linear mixed models with either DeepSAGE or RNA‐Seq measurements as the dependent variable and disease severity as the independent variable. In conclusion, (1) for one out of 14 genes, the initial significant result could be replicated with both technologies using data from both time points; (2) statistical efficiency is lost due to disagreement between the two technologies, measurement error when predicting gene expressions, and the need to include additional parameters to account for possible differences.
format Online
Article
Text
id pubmed-8049011
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-80490112021-04-20 Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease Fuady, Angga M. van Roon‐Mom, Willeke M. C. Kiełbasa, Szymon M. Uh, Hae‐Won Houwing‐Duistermaat, Jeanine J. Biom J Longitudinal and Time‐to‐event Analysis Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtained at different time points can be caused by technical differences. Modeling the two measurements jointly over time might provide insight into the causes of these different results. Our work is motivated by a study of gene expression data of blood samples from Huntington disease patients, which were obtained using two different sequencing technologies. At time point 1, DeepSAGE technology was used to measure the gene expression, with a subsample also measured using RNA‐Seq technology. At time point 2, all samples were measured using RNA‐Seq technology. Significant associations between gene expression measured by DeepSAGE and disease severity using data from the first time point could not be replicated by the RNA‐Seq data from the second time point. We modeled the relationship between the two sequencing technologies using the data from the overlapping samples. We used linear mixed models with either DeepSAGE or RNA‐Seq measurements as the dependent variable and disease severity as the independent variable. In conclusion, (1) for one out of 14 genes, the initial significant result could be replicated with both technologies using data from both time points; (2) statistical efficiency is lost due to disagreement between the two technologies, measurement error when predicting gene expressions, and the need to include additional parameters to account for possible differences. John Wiley and Sons Inc. 2020-12-22 2021-04 /pmc/articles/PMC8049011/ /pubmed/33350510 http://dx.doi.org/10.1002/bimj.201900235 Text en © 2020 The Authors. Biometrical Journal published by Wiley‐VCH GmbH. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Longitudinal and Time‐to‐event Analysis
Fuady, Angga M.
van Roon‐Mom, Willeke M. C.
Kiełbasa, Szymon M.
Uh, Hae‐Won
Houwing‐Duistermaat, Jeanine J.
Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title_full Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title_fullStr Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title_full_unstemmed Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title_short Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
title_sort statistical method for modeling sequencing data from different technologies in longitudinal studies with application to huntington disease
topic Longitudinal and Time‐to‐event Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049011/
https://www.ncbi.nlm.nih.gov/pubmed/33350510
http://dx.doi.org/10.1002/bimj.201900235
work_keys_str_mv AT fuadyanggam statisticalmethodformodelingsequencingdatafromdifferenttechnologiesinlongitudinalstudieswithapplicationtohuntingtondisease
AT vanroonmomwillekemc statisticalmethodformodelingsequencingdatafromdifferenttechnologiesinlongitudinalstudieswithapplicationtohuntingtondisease
AT kiełbasaszymonm statisticalmethodformodelingsequencingdatafromdifferenttechnologiesinlongitudinalstudieswithapplicationtohuntingtondisease
AT uhhaewon statisticalmethodformodelingsequencingdatafromdifferenttechnologiesinlongitudinalstudieswithapplicationtohuntingtondisease
AT houwingduistermaatjeaninej statisticalmethodformodelingsequencingdatafromdifferenttechnologiesinlongitudinalstudieswithapplicationtohuntingtondisease