Cargando…

A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required

The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants....

Descripción completa

Detalles Bibliográficos
Autores principales: Dering, Carmen, König, Inke R., Ramsey, Laura B., Relling, Mary V., Yang, Wenjian, Ziegler, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4164031/
https://www.ncbi.nlm.nih.gov/pubmed/25309579
http://dx.doi.org/10.3389/fgene.2014.00323
_version_ 1782334903215456256
author Dering, Carmen
König, Inke R.
Ramsey, Laura B.
Relling, Mary V.
Yang, Wenjian
Ziegler, Andreas
author_facet Dering, Carmen
König, Inke R.
Ramsey, Laura B.
Relling, Mary V.
Yang, Wenjian
Ziegler, Andreas
author_sort Dering, Carmen
collection PubMed
description The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.
format Online
Article
Text
id pubmed-4164031
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-41640312014-10-10 A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required Dering, Carmen König, Inke R. Ramsey, Laura B. Relling, Mary V. Yang, Wenjian Ziegler, Andreas Front Genet Genetics The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants. Frontiers Media S.A. 2014-09-15 /pmc/articles/PMC4164031/ /pubmed/25309579 http://dx.doi.org/10.3389/fgene.2014.00323 Text en Copyright © 2014 Dering, König, Ramsey, Relling, Yang and Ziegler. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Dering, Carmen
König, Inke R.
Ramsey, Laura B.
Relling, Mary V.
Yang, Wenjian
Ziegler, Andreas
A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title_full A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title_fullStr A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title_full_unstemmed A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title_short A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
title_sort comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4164031/
https://www.ncbi.nlm.nih.gov/pubmed/25309579
http://dx.doi.org/10.3389/fgene.2014.00323
work_keys_str_mv AT deringcarmen acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT koniginker acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT ramseylaurab acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT rellingmaryv acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT yangwenjian acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT zieglerandreas acomprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT deringcarmen comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT koniginker comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT ramseylaurab comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT rellingmaryv comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT yangwenjian comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired
AT zieglerandreas comprehensiveevaluationofcollapsingmethodsusingsimulatedandrealdataexcellentannotationoffunctionalityandlargesamplesizesrequired