Cargando…

Integration of multiple data sources to prioritize candidate genes using discounted rating system

BACKGROUND: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their similarity to known disease genes. Most of existing gene prioritization methods access only one genomic data source, which is...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yongjin, Patra, Jagdish C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009491/
https://www.ncbi.nlm.nih.gov/pubmed/20122192
http://dx.doi.org/10.1186/1471-2105-11-S1-S20
_version_ 1782194690636906496
author Li, Yongjin
Patra, Jagdish C
author_facet Li, Yongjin
Patra, Jagdish C
author_sort Li, Yongjin
collection PubMed
description BACKGROUND: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their similarity to known disease genes. Most of existing gene prioritization methods access only one genomic data source, which is noisy and incomplete. Thus, there is a need for the integration of multiple data sources containing different information. RESULTS: In this paper, we proposed a combination strategy, called discounted rating system (DRS). We performed leave one out cross validation to compare it with N-dimensional order statistics (NDOS) used in Endeavour. Results showed that the AUC (Area Under the Curve) values achieved by DRS were comparable with NDOS on most of the disease families. But DRS worked much faster than NDOS, especially when the number of data sources increases. When there are 100 candidate genes and 20 data sources, DRS works more than 180 times faster than NDOS. In the framework of DRS, we give different weights for different data sources. The weighted DRS achieved significantly higher AUC values than NDOS. CONCLUSION: The proposed DRS algorithm is a powerful and effective framework for candidate gene prioritization. If weights of different data sources are proper given, the DRS algorithm will perform better.
format Text
id pubmed-3009491
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30094912010-12-23 Integration of multiple data sources to prioritize candidate genes using discounted rating system Li, Yongjin Patra, Jagdish C BMC Bioinformatics Research BACKGROUND: Identifying disease gene from a list of candidate genes is an important task in bioinformatics. The main strategy is to prioritize candidate genes based on their similarity to known disease genes. Most of existing gene prioritization methods access only one genomic data source, which is noisy and incomplete. Thus, there is a need for the integration of multiple data sources containing different information. RESULTS: In this paper, we proposed a combination strategy, called discounted rating system (DRS). We performed leave one out cross validation to compare it with N-dimensional order statistics (NDOS) used in Endeavour. Results showed that the AUC (Area Under the Curve) values achieved by DRS were comparable with NDOS on most of the disease families. But DRS worked much faster than NDOS, especially when the number of data sources increases. When there are 100 candidate genes and 20 data sources, DRS works more than 180 times faster than NDOS. In the framework of DRS, we give different weights for different data sources. The weighted DRS achieved significantly higher AUC values than NDOS. CONCLUSION: The proposed DRS algorithm is a powerful and effective framework for candidate gene prioritization. If weights of different data sources are proper given, the DRS algorithm will perform better. BioMed Central 2010-01-18 /pmc/articles/PMC3009491/ /pubmed/20122192 http://dx.doi.org/10.1186/1471-2105-11-S1-S20 Text en Copyright ©2010 Li and Patra; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Li, Yongjin
Patra, Jagdish C
Integration of multiple data sources to prioritize candidate genes using discounted rating system
title Integration of multiple data sources to prioritize candidate genes using discounted rating system
title_full Integration of multiple data sources to prioritize candidate genes using discounted rating system
title_fullStr Integration of multiple data sources to prioritize candidate genes using discounted rating system
title_full_unstemmed Integration of multiple data sources to prioritize candidate genes using discounted rating system
title_short Integration of multiple data sources to prioritize candidate genes using discounted rating system
title_sort integration of multiple data sources to prioritize candidate genes using discounted rating system
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009491/
https://www.ncbi.nlm.nih.gov/pubmed/20122192
http://dx.doi.org/10.1186/1471-2105-11-S1-S20
work_keys_str_mv AT liyongjin integrationofmultipledatasourcestoprioritizecandidategenesusingdiscountedratingsystem
AT patrajagdishc integrationofmultipledatasourcestoprioritizecandidategenesusingdiscountedratingsystem