Cargando…

A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data

BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States,...

Descripción completa

Detalles Bibliográficos
Autores principales: Belloy, Michael E., Le Guen, Yann, Eger, Sarah J., Napolioni, Valerio, Greicius, Michael D., He, Zihuai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372872/
https://www.ncbi.nlm.nih.gov/pubmed/35966919
http://dx.doi.org/10.1212/NXG.0000000000200012
_version_ 1784767483068022784
author Belloy, Michael E.
Le Guen, Yann
Eger, Sarah J.
Napolioni, Valerio
Greicius, Michael D.
He, Zihuai
author_facet Belloy, Michael E.
Le Guen, Yann
Eger, Sarah J.
Napolioni, Valerio
Greicius, Michael D.
He, Zihuai
author_sort Belloy, Michael E.
collection PubMed
description BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data. METHODS: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5). RESULTS: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants. DISCUSSION: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.
format Online
Article
Text
id pubmed-9372872
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Wolters Kluwer
record_format MEDLINE/PubMed
spelling pubmed-93728722022-08-12 A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data Belloy, Michael E. Le Guen, Yann Eger, Sarah J. Napolioni, Valerio Greicius, Michael D. He, Zihuai Neurol Genet Research Article BACKGROUND AND OBJECTIVES: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data. METHODS: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5). RESULTS: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants. DISCUSSION: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs. Wolters Kluwer 2022-08-11 /pmc/articles/PMC9372872/ /pubmed/35966919 http://dx.doi.org/10.1212/NXG.0000000000200012 Text en Copyright © 2022 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
spellingShingle Research Article
Belloy, Michael E.
Le Guen, Yann
Eger, Sarah J.
Napolioni, Valerio
Greicius, Michael D.
He, Zihuai
A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title_full A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title_fullStr A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title_full_unstemmed A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title_short A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data
title_sort fast and robust strategy to remove variant-level artifacts in alzheimer disease sequencing project data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372872/
https://www.ncbi.nlm.nih.gov/pubmed/35966919
http://dx.doi.org/10.1212/NXG.0000000000200012
work_keys_str_mv AT belloymichaele afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT leguenyann afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT egersarahj afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT napolionivalerio afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT greiciusmichaeld afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT hezihuai afastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT belloymichaele fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT leguenyann fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT egersarahj fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT napolionivalerio fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT greiciusmichaeld fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata
AT hezihuai fastandrobuststrategytoremovevariantlevelartifactsinalzheimerdiseasesequencingprojectdata