Cargando…

A combined approach to data mining of textual and structured data to identify cancer-related targets

BACKGROUND: We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathw...

Descripción completa

Detalles Bibliográficos
Autores principales: Pospisil, Pavel, Iyer, Lakshmanan K, Adelstein, S James, Kassis, Amin I
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1555615/
https://www.ncbi.nlm.nih.gov/pubmed/16857057
http://dx.doi.org/10.1186/1471-2105-7-354
_version_ 1782129367109861376
author Pospisil, Pavel
Iyer, Lakshmanan K
Adelstein, S James
Kassis, Amin I
author_facet Pospisil, Pavel
Iyer, Lakshmanan K
Adelstein, S James
Kassis, Amin I
author_sort Pospisil, Pavel
collection PubMed
description BACKGROUND: We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) was assembled to distinguish enzymes with hydrolytic activities that are expressed in the extracellular space of cancer cells. Proteins were identified with respect to six types of cancer occurring in the prostate, breast, lung, colon, ovary, and pancreas. RESULTS: The data mining method identified previously undetected targets. Our combined strategy applied to each cancer type identified a minimum of 375 proteins expressed within the extracellular space and/or attached to the plasma membrane. The method led to the recognition of human cancer-related hydrolases (on average, ~35 per cancer type), among which were prostatic acid phosphatase, prostate-specific antigen, and sulfatase 1. CONCLUSION: The combined data mining of several databases overcame many of the limitations of querying a single database and enabled the facile identification of gene products. In the case of cancer-related targets, it produced a list of putative extracellular, hydrolytic enzymes that merit additional study as candidates for cancer radioimaging and radiotherapy. The proposed data mining strategy is of a general nature and can be applied to other biological databases for understanding biological functions and diseases.
format Text
id pubmed-1555615
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15556152006-08-26 A combined approach to data mining of textual and structured data to identify cancer-related targets Pospisil, Pavel Iyer, Lakshmanan K Adelstein, S James Kassis, Amin I BMC Bioinformatics Methodology Article BACKGROUND: We present an effective, rapid, systematic data mining approach for identifying genes or proteins related to a particular interest. A selected combination of programs exploring PubMed abstracts, universal gene/protein databases (UniProt, InterPro, NCBI Entrez), and state-of-the-art pathway knowledge bases (LSGraph and Ingenuity Pathway Analysis) was assembled to distinguish enzymes with hydrolytic activities that are expressed in the extracellular space of cancer cells. Proteins were identified with respect to six types of cancer occurring in the prostate, breast, lung, colon, ovary, and pancreas. RESULTS: The data mining method identified previously undetected targets. Our combined strategy applied to each cancer type identified a minimum of 375 proteins expressed within the extracellular space and/or attached to the plasma membrane. The method led to the recognition of human cancer-related hydrolases (on average, ~35 per cancer type), among which were prostatic acid phosphatase, prostate-specific antigen, and sulfatase 1. CONCLUSION: The combined data mining of several databases overcame many of the limitations of querying a single database and enabled the facile identification of gene products. In the case of cancer-related targets, it produced a list of putative extracellular, hydrolytic enzymes that merit additional study as candidates for cancer radioimaging and radiotherapy. The proposed data mining strategy is of a general nature and can be applied to other biological databases for understanding biological functions and diseases. BioMed Central 2006-07-20 /pmc/articles/PMC1555615/ /pubmed/16857057 http://dx.doi.org/10.1186/1471-2105-7-354 Text en Copyright © 2006 Pospisil et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Pospisil, Pavel
Iyer, Lakshmanan K
Adelstein, S James
Kassis, Amin I
A combined approach to data mining of textual and structured data to identify cancer-related targets
title A combined approach to data mining of textual and structured data to identify cancer-related targets
title_full A combined approach to data mining of textual and structured data to identify cancer-related targets
title_fullStr A combined approach to data mining of textual and structured data to identify cancer-related targets
title_full_unstemmed A combined approach to data mining of textual and structured data to identify cancer-related targets
title_short A combined approach to data mining of textual and structured data to identify cancer-related targets
title_sort combined approach to data mining of textual and structured data to identify cancer-related targets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1555615/
https://www.ncbi.nlm.nih.gov/pubmed/16857057
http://dx.doi.org/10.1186/1471-2105-7-354
work_keys_str_mv AT pospisilpavel acombinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT iyerlakshmanank acombinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT adelsteinsjames acombinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT kassisamini acombinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT pospisilpavel combinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT iyerlakshmanank combinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT adelsteinsjames combinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets
AT kassisamini combinedapproachtodataminingoftextualandstructureddatatoidentifycancerrelatedtargets