Cargando…

A Framework for Automated Gene Selection in Genomic Applications

PURPOSE: An efficient framework to identify disease associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limit...

Descripción completa

Detalles Bibliográficos
Autores principales: Lazo de la Vega, L, Yu, W, Machini, K, Austin-Tse, CA, Hao, L, Blout Zawatsky, CL, Mason-Suares, H, Green, RC, Rehm, HL, Lebo, MS
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8487927/
https://www.ncbi.nlm.nih.gov/pubmed/34113001
http://dx.doi.org/10.1038/s41436-021-01213-x
_version_ 1784578053728370688
author Lazo de la Vega, L
Yu, W
Machini, K
Austin-Tse, CA
Hao, L
Blout Zawatsky, CL
Mason-Suares, H
Green, RC
Rehm, HL
Lebo, MS
author_facet Lazo de la Vega, L
Yu, W
Machini, K
Austin-Tse, CA
Hao, L
Blout Zawatsky, CL
Mason-Suares, H
Green, RC
Rehm, HL
Lebo, MS
author_sort Lazo de la Vega, L
collection PubMed
description PURPOSE: An efficient framework to identify disease associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limited to genes with strong or established evidence levels and applications including genes with less or emerging evidence of disease association. METHODS: We extracted genes with evidence for gene-disease association from the Human Gene Mutation Database, Online Mendelian Inheritance in Man, and ClinVar to build a comprehensive gene list of 6,145 genes. Next, we applied stringent filters in conjunction with computationally curated evidence (DisGeNET) to create a restrictive list limited to 3,929 genes with stronger disease associations. RESULTS: When compared to manual gene curation efforts, including the Clinical Genome Resource, genes with strong or definitive disease associations are included in both gene lists at high percentages, while genes with limited evidence are largely removed. We further confirmed the utility of this approach in identifying pathogenic and likely pathogenic variants in 45 genomes. CONCLUSION: Our approach efficiently creates highly sensitive gene lists for genomic applications, while remaining dynamic and updatable, enabling time savings in genomic applications.
format Online
Article
Text
id pubmed-8487927
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-84879272021-12-10 A Framework for Automated Gene Selection in Genomic Applications Lazo de la Vega, L Yu, W Machini, K Austin-Tse, CA Hao, L Blout Zawatsky, CL Mason-Suares, H Green, RC Rehm, HL Lebo, MS Genet Med Article PURPOSE: An efficient framework to identify disease associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limited to genes with strong or established evidence levels and applications including genes with less or emerging evidence of disease association. METHODS: We extracted genes with evidence for gene-disease association from the Human Gene Mutation Database, Online Mendelian Inheritance in Man, and ClinVar to build a comprehensive gene list of 6,145 genes. Next, we applied stringent filters in conjunction with computationally curated evidence (DisGeNET) to create a restrictive list limited to 3,929 genes with stronger disease associations. RESULTS: When compared to manual gene curation efforts, including the Clinical Genome Resource, genes with strong or definitive disease associations are included in both gene lists at high percentages, while genes with limited evidence are largely removed. We further confirmed the utility of this approach in identifying pathogenic and likely pathogenic variants in 45 genomes. CONCLUSION: Our approach efficiently creates highly sensitive gene lists for genomic applications, while remaining dynamic and updatable, enabling time savings in genomic applications. 2021-06-10 2021-10 /pmc/articles/PMC8487927/ /pubmed/34113001 http://dx.doi.org/10.1038/s41436-021-01213-x Text en http://www.nature.com/authors/editorial_policies/license.html#termsUsers may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Lazo de la Vega, L
Yu, W
Machini, K
Austin-Tse, CA
Hao, L
Blout Zawatsky, CL
Mason-Suares, H
Green, RC
Rehm, HL
Lebo, MS
A Framework for Automated Gene Selection in Genomic Applications
title A Framework for Automated Gene Selection in Genomic Applications
title_full A Framework for Automated Gene Selection in Genomic Applications
title_fullStr A Framework for Automated Gene Selection in Genomic Applications
title_full_unstemmed A Framework for Automated Gene Selection in Genomic Applications
title_short A Framework for Automated Gene Selection in Genomic Applications
title_sort framework for automated gene selection in genomic applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8487927/
https://www.ncbi.nlm.nih.gov/pubmed/34113001
http://dx.doi.org/10.1038/s41436-021-01213-x
work_keys_str_mv AT lazodelavegal aframeworkforautomatedgeneselectioningenomicapplications
AT yuw aframeworkforautomatedgeneselectioningenomicapplications
AT machinik aframeworkforautomatedgeneselectioningenomicapplications
AT austintseca aframeworkforautomatedgeneselectioningenomicapplications
AT haol aframeworkforautomatedgeneselectioningenomicapplications
AT bloutzawatskycl aframeworkforautomatedgeneselectioningenomicapplications
AT masonsuaresh aframeworkforautomatedgeneselectioningenomicapplications
AT greenrc aframeworkforautomatedgeneselectioningenomicapplications
AT rehmhl aframeworkforautomatedgeneselectioningenomicapplications
AT leboms aframeworkforautomatedgeneselectioningenomicapplications
AT lazodelavegal frameworkforautomatedgeneselectioningenomicapplications
AT yuw frameworkforautomatedgeneselectioningenomicapplications
AT machinik frameworkforautomatedgeneselectioningenomicapplications
AT austintseca frameworkforautomatedgeneselectioningenomicapplications
AT haol frameworkforautomatedgeneselectioningenomicapplications
AT bloutzawatskycl frameworkforautomatedgeneselectioningenomicapplications
AT masonsuaresh frameworkforautomatedgeneselectioningenomicapplications
AT greenrc frameworkforautomatedgeneselectioningenomicapplications
AT rehmhl frameworkforautomatedgeneselectioningenomicapplications
AT leboms frameworkforautomatedgeneselectioningenomicapplications