Cargando…
Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system
BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological path...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC420239/ https://www.ncbi.nlm.nih.gov/pubmed/15115540 http://dx.doi.org/10.1186/1471-2164-5-28 |
_version_ | 1782121468929245184 |
---|---|
author | Silva, Diego G Schönbach, Christian Brusic, Vladimir Socha, Luis A Nagashima, Takeshi Petrovsky, Nikolai |
author_facet | Silva, Diego G Schönbach, Christian Brusic, Vladimir Socha, Luis A Nagashima, Takeshi Petrovsky, Nikolai |
author_sort | Silva, Diego G |
collection | PubMed |
description | BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets. |
format | Text |
id | pubmed-420239 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-4202392004-06-06 Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system Silva, Diego G Schönbach, Christian Brusic, Vladimir Socha, Luis A Nagashima, Takeshi Petrovsky, Nikolai BMC Genomics Research Article BACKGROUND: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term "patholog" to mean a homolog of a human disease-related gene encoding a product (transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. RESULTS: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity (70–85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool (FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic (53%), hereditary (24%), immunological (5%), cardio-vascular (4%), or other (14%), disorders. CONCLUSIONS: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets. BioMed Central 2004-04-29 /pmc/articles/PMC420239/ /pubmed/15115540 http://dx.doi.org/10.1186/1471-2164-5-28 Text en Copyright © 2004 Silva et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Research Article Silva, Diego G Schönbach, Christian Brusic, Vladimir Socha, Luis A Nagashima, Takeshi Petrovsky, Nikolai Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title | Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title_full | Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title_fullStr | Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title_full_unstemmed | Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title_short | Identification of "pathologs" (disease-related genes) from the RIKEN mouse cDNA dataset using human curation plus FACTS, a new biological information extraction system |
title_sort | identification of "pathologs" (disease-related genes) from the riken mouse cdna dataset using human curation plus facts, a new biological information extraction system |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC420239/ https://www.ncbi.nlm.nih.gov/pubmed/15115540 http://dx.doi.org/10.1186/1471-2164-5-28 |
work_keys_str_mv | AT silvadiegog identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem AT schonbachchristian identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem AT brusicvladimir identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem AT sochaluisa identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem AT nagashimatakeshi identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem AT petrovskynikolai identificationofpathologsdiseaserelatedgenesfromtherikenmousecdnadatasetusinghumancurationplusfactsanewbiologicalinformationextractionsystem |