Cargando…

Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system

BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological path...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Diego G, Schönbach, Christian, Brusic, Vladimir, Socha, Luis A, Nagashima, Takeshi, Petrovsky, Nikolai
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC420239/
https://www.ncbi.nlm.nih.gov/pubmed/15115540
http://dx.doi.org/10.1186/1471-2164-5-28
_version_ 1782121468929245184
author Silva, Diego G
Schönbach, Christian
Brusic, Vladimir
Socha, Luis A
Nagashima, Takeshi
Petrovsky, Nikolai
author_facet Silva, Diego G
Schönbach, Christian
Brusic, Vladimir
Socha, Luis A
Nagashima, Takeshi
Petrovsky, Nikolai
author_sort Silva, Diego G
collection PubMed
description BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
format Text
id pubmed-420239
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4202392004-06-06 Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system Silva, Diego G Schönbach, Christian Brusic, Vladimir Socha, Luis A Nagashima, Takeshi Petrovsky, Nikolai BMC Genomics Research Article BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets. BioMed Central 2004-04-29 /pmc/articles/PMC420239/ /pubmed/15115540 http://dx.doi.org/10.1186/1471-2164-5-28 Text en Copyright © 2004 Silva et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Silva, Diego G
Schönbach, Christian
Brusic, Vladimir
Socha, Luis A
Nagashima, Takeshi
Petrovsky, Nikolai
Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title_full Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title_fullStr Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title_full_unstemmed Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title_short Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
title_sort identification of "pathologs" (disease-related genes) from the riken mouse cdna dataset using human curation plus facts, a new biological information extraction system
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC420239/
https://www.ncbi.nlm.nih.gov/pubmed/15115540
http://dx.doi.org/10.1186/1471-2164-5-28
work_keys_str_mv AT silvadiegog identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem
AT schonbachchristian identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem
AT brusicvladimir identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem
AT sochaluisa identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem
AT nagashimatakeshi identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem
AT petrovskynikolai identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem