Cargando…
A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Wolters Kluwer
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372872/ https://www.ncbi.nlm.nih.gov/pubmed/35966919 http://dx.doi.org/10.1212/NXG.0000000000200012 |
_version_ | 1784767483068022784 |
---|---|
author | Belloy, Michael E. Le Guen, Yann Eger, Sarah J. Napolioni, Valerio Greicius, Michael D. He, Zihuai |
author_facet | Belloy, Michael E. Le Guen, Yann Eger, Sarah J. Napolioni, Valerio Greicius, Michael D. He, Zihuai |
author_sort | Belloy, Michael E. |
collection | PubMed |
description | BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data. METHODS: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5). RESULTS: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants. DISCUSSION: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs. |
format | Online Article Text |
id | pubmed-9372872 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Wolters Kluwer |
record_format | MEDLINE/PubMed |
spelling | pubmed-93728722022-08-12 A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data Belloy, Michael E. Le Guen, Yann Eger, Sarah J. Napolioni, Valerio Greicius, Michael D. He, Zihuai Neurol Genet Research Article BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data. METHODS: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5). RESULTS: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants. DISCUSSION: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs. Wolters Kluwer 2022-08-11 /pmc/articles/PMC9372872/ /pubmed/35966919 http://dx.doi.org/10.1212/NXG.0000000000200012 Text en Copyright © 2022 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. |
spellingShingle | Research Article Belloy, Michael E. Le Guen, Yann Eger, Sarah J. Napolioni, Valerio Greicius, Michael D. He, Zihuai A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title | A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title_full | A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title_fullStr | A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title_full_unstemmed | A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title_short | A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data |
title_sort | fast and robust strategy to remove variant-level artifacts in alzheimer disease sequencing project data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372872/ https://www.ncbi.nlm.nih.gov/pubmed/35966919 http://dx.doi.org/10.1212/NXG.0000000000200012 |
work_keys_str_mv | AT belloymichaele afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT leguenyann afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT egersarahj afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT napolionivalerio afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT greiciusmichaeld afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT hezihuai afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT belloymichaele fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT leguenyann fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT egersarahj fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT napolionivalerio fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT greiciusmichaeld fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata AT hezihuai fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata |