Cargando…

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome In...

Descripción completa

Detalles Bibliográficos
Autores principales: Oh, Sehyun, Abdelnabi, Jasmine, Al-Dulaimi, Ragheed, Aggarwal, Ayush, Ramos, Marcel, Davis, Sean, Riester, Markus, Waldron, Levi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7856679/
https://www.ncbi.nlm.nih.gov/pubmed/33564398
http://dx.doi.org/10.12688/f1000research.28033.2
_version_ 1783646294898638848
author Oh, Sehyun
Abdelnabi, Jasmine
Al-Dulaimi, Ragheed
Aggarwal, Ayush
Ramos, Marcel
Davis, Sean
Riester, Markus
Waldron, Levi
author_facet Oh, Sehyun
Abdelnabi, Jasmine
Al-Dulaimi, Ragheed
Aggarwal, Ayush
Ramos, Marcel
Davis, Sean
Riester, Markus
Waldron, Levi
author_sort Oh, Sehyun
collection PubMed
description Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.
format Online
Article
Text
id pubmed-7856679
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-78566792021-02-08 HGNChelper: identification and correction of invalid gene symbols for human and mouse Oh, Sehyun Abdelnabi, Jasmine Al-Dulaimi, Ragheed Aggarwal, Ayush Ramos, Marcel Davis, Sean Riester, Markus Waldron, Levi F1000Res Software Tool Article Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (MSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN. F1000 Research Limited 2022-06-09 /pmc/articles/PMC7856679/ /pubmed/33564398 http://dx.doi.org/10.12688/f1000research.28033.2 Text en Copyright: © 2022 Oh S et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Oh, Sehyun
Abdelnabi, Jasmine
Al-Dulaimi, Ragheed
Aggarwal, Ayush
Ramos, Marcel
Davis, Sean
Riester, Markus
Waldron, Levi
HGNChelper: identification and correction of invalid gene symbols for human and mouse
title HGNChelper: identification and correction of invalid gene symbols for human and mouse
title_full HGNChelper: identification and correction of invalid gene symbols for human and mouse
title_fullStr HGNChelper: identification and correction of invalid gene symbols for human and mouse
title_full_unstemmed HGNChelper: identification and correction of invalid gene symbols for human and mouse
title_short HGNChelper: identification and correction of invalid gene symbols for human and mouse
title_sort hgnchelper: identification and correction of invalid gene symbols for human and mouse
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7856679/
https://www.ncbi.nlm.nih.gov/pubmed/33564398
http://dx.doi.org/10.12688/f1000research.28033.2
work_keys_str_mv AT ohsehyun hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT abdelnabijasmine hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT aldulaimiragheed hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT aggarwalayush hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT ramosmarcel hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT davissean hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT riestermarkus hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse
AT waldronlevi hgnchelperidentificationandcorrectionofinvalidgenesymbolsforhumanandmouse