Cargando…

Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related l...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Hui, Lan, Chaowang, Liu, Yuansheng, Liu, Tao, Blumenstein, Michael, Li, Jinyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Impact Journals LLC 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668007/
https://www.ncbi.nlm.nih.gov/pubmed/29108274
http://dx.doi.org/10.18632/oncotarget.20481
_version_ 1783275594362912768
author Peng, Hui
Lan, Chaowang
Liu, Yuansheng
Liu, Tao
Blumenstein, Michael
Li, Jinyan
author_facet Peng, Hui
Lan, Chaowang
Liu, Yuansheng
Liu, Tao
Blumenstein, Michael
Li, Jinyan
author_sort Peng, Hui
collection PubMed
description Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
format Online
Article
Text
id pubmed-5668007
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Impact Journals LLC
record_format MEDLINE/PubMed
spelling pubmed-56680072017-11-04 Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes Peng, Hui Lan, Chaowang Liu, Yuansheng Liu, Tao Blumenstein, Michael Li, Jinyan Oncotarget Research Paper Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. Impact Journals LLC 2017-08-24 /pmc/articles/PMC5668007/ /pubmed/29108274 http://dx.doi.org/10.18632/oncotarget.20481 Text en Copyright: © 2017 Peng et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) 3.0 (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Paper
Peng, Hui
Lan, Chaowang
Liu, Yuansheng
Liu, Tao
Blumenstein, Michael
Li, Jinyan
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title_full Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title_fullStr Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title_full_unstemmed Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title_short Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
title_sort chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668007/
https://www.ncbi.nlm.nih.gov/pubmed/29108274
http://dx.doi.org/10.18632/oncotarget.20481
work_keys_str_mv AT penghui chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes
AT lanchaowang chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes
AT liuyuansheng chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes
AT liutao chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes
AT blumensteinmichael chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes
AT lijinyan chromosomepreferenceofdiseasegenesandvectorizationforthepredictionofnoncodingdiseasegenes