Cargando…

Speeding disease gene discovery by sequence based candidate prioritization

BACKGROUND: Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show th...

Descripción completa

Detalles Bibliográficos
Autores principales: Adie, Euan A, Adams, Richard R, Evans, Kathryn L, Porteous, David J, Pickard, Ben S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274252/
https://www.ncbi.nlm.nih.gov/pubmed/15766383
http://dx.doi.org/10.1186/1471-2105-6-55
_version_ 1782125973409366016
author Adie, Euan A
Adams, Richard R
Evans, Kathryn L
Porteous, David J
Pickard, Ben S
author_facet Adie, Euan A
Adams, Richard R
Evans, Kathryn L
Porteous, David J
Pickard, Ben S
author_sort Adie, Euan A
collection PubMed
description BACKGROUND: Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. RESULTS: We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. CONCLUSION: PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.
format Text
id pubmed-1274252
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12742522005-10-29 Speeding disease gene discovery by sequence based candidate prioritization Adie, Euan A Adams, Richard R Evans, Kathryn L Porteous, David J Pickard, Ben S BMC Bioinformatics Research Article BACKGROUND: Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. RESULTS: We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. CONCLUSION: PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies. BioMed Central 2005-03-14 /pmc/articles/PMC1274252/ /pubmed/15766383 http://dx.doi.org/10.1186/1471-2105-6-55 Text en Copyright © 2005 Adie et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Adie, Euan A
Adams, Richard R
Evans, Kathryn L
Porteous, David J
Pickard, Ben S
Speeding disease gene discovery by sequence based candidate prioritization
title Speeding disease gene discovery by sequence based candidate prioritization
title_full Speeding disease gene discovery by sequence based candidate prioritization
title_fullStr Speeding disease gene discovery by sequence based candidate prioritization
title_full_unstemmed Speeding disease gene discovery by sequence based candidate prioritization
title_short Speeding disease gene discovery by sequence based candidate prioritization
title_sort speeding disease gene discovery by sequence based candidate prioritization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1274252/
https://www.ncbi.nlm.nih.gov/pubmed/15766383
http://dx.doi.org/10.1186/1471-2105-6-55
work_keys_str_mv AT adieeuana speedingdiseasegenediscoverybysequencebasedcandidateprioritization
AT adamsrichardr speedingdiseasegenediscoverybysequencebasedcandidateprioritization
AT evanskathrynl speedingdiseasegenediscoverybysequencebasedcandidateprioritization
AT porteousdavidj speedingdiseasegenediscoverybysequencebasedcandidateprioritization
AT pickardbens speedingdiseasegenediscoverybysequencebasedcandidateprioritization