Cargando…

Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder

Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typi...

Descripción completa

Detalles Bibliográficos
Autores principales: Varma, Maya, Paskov, Kelley Marie, Jung, Jae-Yoon, Chrisman, Brianna Sierra, Stockham, Nate Tyler, Washington, Peter Yigitcan, Wall, Dennis Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417813/
https://www.ncbi.nlm.nih.gov/pubmed/30864328
_version_ 1783403627009802240
author Varma, Maya
Paskov, Kelley Marie
Jung, Jae-Yoon
Chrisman, Brianna Sierra
Stockham, Nate Tyler
Washington, Peter Yigitcan
Wall, Dennis Paul
author_facet Varma, Maya
Paskov, Kelley Marie
Jung, Jae-Yoon
Chrisman, Brianna Sierra
Stockham, Nate Tyler
Washington, Peter Yigitcan
Wall, Dennis Paul
author_sort Varma, Maya
collection PubMed
description Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ(1)-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.
format Online
Article
Text
id pubmed-6417813
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-64178132019-03-14 Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder Varma, Maya Paskov, Kelley Marie Jung, Jae-Yoon Chrisman, Brianna Sierra Stockham, Nate Tyler Washington, Peter Yigitcan Wall, Dennis Paul Pac Symp Biocomput Article Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ(1)-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders. 2019 /pmc/articles/PMC6417813/ /pubmed/30864328 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License
spellingShingle Article
Varma, Maya
Paskov, Kelley Marie
Jung, Jae-Yoon
Chrisman, Brianna Sierra
Stockham, Nate Tyler
Washington, Peter Yigitcan
Wall, Dennis Paul
Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title_full Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title_fullStr Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title_full_unstemmed Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title_short Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
title_sort outgroup machine learning approach identifies single nucleotide variants in noncoding dna associated with autism spectrum disorder
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417813/
https://www.ncbi.nlm.nih.gov/pubmed/30864328
work_keys_str_mv AT varmamaya outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT paskovkelleymarie outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT jungjaeyoon outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT chrismanbriannasierra outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT stockhamnatetyler outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT washingtonpeteryigitcan outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder
AT walldennispaul outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder