Cargando…

A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form

BACKGROUND: The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and th...

Descripción completa

Detalles Bibliográficos
Autores principales: Laros, Jeroen F J, Blavier, André, den Dunnen, Johan T, Taschner, Peter E M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194197/
https://www.ncbi.nlm.nih.gov/pubmed/21992071
http://dx.doi.org/10.1186/1471-2105-12-S4-S5
_version_ 1782213928351170560
author Laros, Jeroen F J
Blavier, André
den Dunnen, Johan T
Taschner, Peter E M
author_facet Laros, Jeroen F J
Blavier, André
den Dunnen, Johan T
Taschner, Peter E M
author_sort Laros, Jeroen F J
collection PubMed
description BACKGROUND: The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and their functional consequences from databases and literature. Existing text mining focuses on the recognition of protein variants and their effects. The recognition of variants at the DNA and RNA levels is essential for dissemination of variant data for diagnostic purposes. Development of new tools is hampered by the complexity of the current nomenclature, which requires processing at the character level to recognize the specific syntactic constructs used in variant descriptions. RESULTS: We approached the gene variant nomenclature as a scientific sublanguage and created two formal descriptions of the syntax in Extended Backus-Naur Form: one at the DNA-RNA level and one at the protein level. To ensure compatibility to older versions of the human sequence variant nomenclature, previously recommended variant description formats have been included. The first grammar versions were designed to help build variant description handling in the Alamut mutation interpretation software. The DNA and RNA level descriptions were then updated and used to construct the context-free parser of the Mutalyzer 2 sequence variant nomenclature checker, which has already been used to check more than one million variant descriptions. CONCLUSIONS: The Extended Backus-Naur Form provided an overview of the full complexity of the syntax of the sequence variant nomenclature, which remained hidden in the textual format and the division of the recommendations across the DNA, RNA and protein sections of the Human Genome Variation Society nomenclature website (http://www.hgvs.org/mutnomen/). This insight into the syntax of the nomenclature could be used to design detailed and clear rules for software development. The Mutalyzer 2 parser demonstrated that it facilitated decomposition of complex variant descriptions into their individual parts. The Extended Backus-Naur Form or parts of it can be used or modified by adding rules, allowing the development of specific sequence variant text mining tools and other programs, which can generate or handle sequence variant descriptions.
format Online
Article
Text
id pubmed-3194197
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31941972011-10-17 A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form Laros, Jeroen F J Blavier, André den Dunnen, Johan T Taschner, Peter E M BMC Bioinformatics Research BACKGROUND: The use of a standard human sequence variant nomenclature is advocated by the Human Genome Variation Society in order to unambiguously describe genetic variants in databases and literature. There is a clear need for tools that allow the mining of data about human sequence variants and their functional consequences from databases and literature. Existing text mining focuses on the recognition of protein variants and their effects. The recognition of variants at the DNA and RNA levels is essential for dissemination of variant data for diagnostic purposes. Development of new tools is hampered by the complexity of the current nomenclature, which requires processing at the character level to recognize the specific syntactic constructs used in variant descriptions. RESULTS: We approached the gene variant nomenclature as a scientific sublanguage and created two formal descriptions of the syntax in Extended Backus-Naur Form: one at the DNA-RNA level and one at the protein level. To ensure compatibility to older versions of the human sequence variant nomenclature, previously recommended variant description formats have been included. The first grammar versions were designed to help build variant description handling in the Alamut mutation interpretation software. The DNA and RNA level descriptions were then updated and used to construct the context-free parser of the Mutalyzer 2 sequence variant nomenclature checker, which has already been used to check more than one million variant descriptions. CONCLUSIONS: The Extended Backus-Naur Form provided an overview of the full complexity of the syntax of the sequence variant nomenclature, which remained hidden in the textual format and the division of the recommendations across the DNA, RNA and protein sections of the Human Genome Variation Society nomenclature website (http://www.hgvs.org/mutnomen/). This insight into the syntax of the nomenclature could be used to design detailed and clear rules for software development. The Mutalyzer 2 parser demonstrated that it facilitated decomposition of complex variant descriptions into their individual parts. The Extended Backus-Naur Form or parts of it can be used or modified by adding rules, allowing the development of specific sequence variant text mining tools and other programs, which can generate or handle sequence variant descriptions. BioMed Central 2011-07-05 /pmc/articles/PMC3194197/ /pubmed/21992071 http://dx.doi.org/10.1186/1471-2105-12-S4-S5 Text en Copyright ©2011 Laros et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Laros, Jeroen F J
Blavier, André
den Dunnen, Johan T
Taschner, Peter E M
A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title_full A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title_fullStr A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title_full_unstemmed A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title_short A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form
title_sort formalized description of the standard human variant nomenclature in extended backus-naur form
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194197/
https://www.ncbi.nlm.nih.gov/pubmed/21992071
http://dx.doi.org/10.1186/1471-2105-12-S4-S5
work_keys_str_mv AT larosjeroenfj aformalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT blavierandre aformalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT dendunnenjohant aformalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT taschnerpeterem aformalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT larosjeroenfj formalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT blavierandre formalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT dendunnenjohant formalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform
AT taschnerpeterem formalizeddescriptionofthestandardhumanvariantnomenclatureinextendedbackusnaurform