Cargando…

Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeng, Xinge Jessie, Daye, Zhongyin John, Lu, Wenbin, Tzeng, Jung-Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927097/
https://www.ncbi.nlm.nih.gov/pubmed/27355347
http://dx.doi.org/10.1371/journal.pcbi.1004993
_version_ 1782440215605936128
author Jeng, Xinge Jessie
Daye, Zhongyin John
Lu, Wenbin
Tzeng, Jung-Ying
author_facet Jeng, Xinge Jessie
Daye, Zhongyin John
Lu, Wenbin
Tzeng, Jung-Ying
author_sort Jeng, Xinge Jessie
collection PubMed
description Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information.
format Online
Article
Text
id pubmed-4927097
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49270972016-07-18 Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level Jeng, Xinge Jessie Daye, Zhongyin John Lu, Wenbin Tzeng, Jung-Ying PLoS Comput Biol Research Article Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information. Public Library of Science 2016-06-29 /pmc/articles/PMC4927097/ /pubmed/27355347 http://dx.doi.org/10.1371/journal.pcbi.1004993 Text en © 2016 Jeng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jeng, Xinge Jessie
Daye, Zhongyin John
Lu, Wenbin
Tzeng, Jung-Ying
Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title_full Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title_fullStr Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title_full_unstemmed Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title_short Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level
title_sort rare variants association analysis in large-scale sequencing studies at the single locus level
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927097/
https://www.ncbi.nlm.nih.gov/pubmed/27355347
http://dx.doi.org/10.1371/journal.pcbi.1004993
work_keys_str_mv AT jengxingejessie rarevariantsassociationanalysisinlargescalesequencingstudiesatthesinglelocuslevel
AT dayezhongyinjohn rarevariantsassociationanalysisinlargescalesequencingstudiesatthesinglelocuslevel
AT luwenbin rarevariantsassociationanalysisinlargescalesequencingstudiesatthesinglelocuslevel
AT tzengjungying rarevariantsassociationanalysisinlargescalesequencingstudiesatthesinglelocuslevel