Cargando…

Identification of genomic features in the classification of loss- and gain-of-function mutation

BACKGROUND: Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whethe...

Descripción completa

Detalles Bibliográficos
Autores principales: Jung, Seunghwan, Lee, Sejoon, Kim, Sangwoo, Nam, Hojung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460711/
https://www.ncbi.nlm.nih.gov/pubmed/26043747
http://dx.doi.org/10.1186/1472-6947-15-S1-S6
_version_ 1782375420522397696
author Jung, Seunghwan
Lee, Sejoon
Kim, Sangwoo
Nam, Hojung
author_facet Jung, Seunghwan
Lee, Sejoon
Kim, Sangwoo
Nam, Hojung
author_sort Jung, Seunghwan
collection PubMed
description BACKGROUND: Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations. METHODS: In order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations. RESULTS: We analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively. CONCLUSIONS: We analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power.
format Online
Article
Text
id pubmed-4460711
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44607112015-06-29 Identification of genomic features in the classification of loss- and gain-of-function mutation Jung, Seunghwan Lee, Sejoon Kim, Sangwoo Nam, Hojung BMC Med Inform Decis Mak Research Article BACKGROUND: Alterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations. METHODS: In order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations. RESULTS: We analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively. CONCLUSIONS: We analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power. BioMed Central 2015-05-20 /pmc/articles/PMC4460711/ /pubmed/26043747 http://dx.doi.org/10.1186/1472-6947-15-S1-S6 Text en Copyright © 2015 Jung et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Jung, Seunghwan
Lee, Sejoon
Kim, Sangwoo
Nam, Hojung
Identification of genomic features in the classification of loss- and gain-of-function mutation
title Identification of genomic features in the classification of loss- and gain-of-function mutation
title_full Identification of genomic features in the classification of loss- and gain-of-function mutation
title_fullStr Identification of genomic features in the classification of loss- and gain-of-function mutation
title_full_unstemmed Identification of genomic features in the classification of loss- and gain-of-function mutation
title_short Identification of genomic features in the classification of loss- and gain-of-function mutation
title_sort identification of genomic features in the classification of loss- and gain-of-function mutation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460711/
https://www.ncbi.nlm.nih.gov/pubmed/26043747
http://dx.doi.org/10.1186/1472-6947-15-S1-S6
work_keys_str_mv AT jungseunghwan identificationofgenomicfeaturesintheclassificationoflossandgainoffunctionmutation
AT leesejoon identificationofgenomicfeaturesintheclassificationoflossandgainoffunctionmutation
AT kimsangwoo identificationofgenomicfeaturesintheclassificationoflossandgainoffunctionmutation
AT namhojung identificationofgenomicfeaturesintheclassificationoflossandgainoffunctionmutation