Cargando…

Measures of co-expression for improved function prediction of long non-coding RNAs

BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in alr...

Descripción completa

Detalles Bibliográficos
Autores principales: Ehsani, Rezvan, Drabløs, Finn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300029/
https://www.ncbi.nlm.nih.gov/pubmed/30567492
http://dx.doi.org/10.1186/s12859-018-2546-y
_version_ 1783381612251054080
author Ehsani, Rezvan
Drabløs, Finn
author_facet Ehsani, Rezvan
Drabløs, Finn
author_sort Ehsani, Rezvan
collection PubMed
description BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2546-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6300029
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63000292018-12-20 Measures of co-expression for improved function prediction of long non-coding RNAs Ehsani, Rezvan Drabløs, Finn BMC Bioinformatics Methodology Article BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2546-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-19 /pmc/articles/PMC6300029/ /pubmed/30567492 http://dx.doi.org/10.1186/s12859-018-2546-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ehsani, Rezvan
Drabløs, Finn
Measures of co-expression for improved function prediction of long non-coding RNAs
title Measures of co-expression for improved function prediction of long non-coding RNAs
title_full Measures of co-expression for improved function prediction of long non-coding RNAs
title_fullStr Measures of co-expression for improved function prediction of long non-coding RNAs
title_full_unstemmed Measures of co-expression for improved function prediction of long non-coding RNAs
title_short Measures of co-expression for improved function prediction of long non-coding RNAs
title_sort measures of co-expression for improved function prediction of long non-coding rnas
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6300029/
https://www.ncbi.nlm.nih.gov/pubmed/30567492
http://dx.doi.org/10.1186/s12859-018-2546-y
work_keys_str_mv AT ehsanirezvan measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas
AT drabløsfinn measuresofcoexpressionforimprovedfunctionpredictionoflongnoncodingrnas