Cargando…

Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D

Motivation: Identification and characterization of protein–protein interactions (PPIs) is one of the key aims in biological research. While previous research in text mining has made substantial progress in automatic PPI detection from literature, the need to improve the precision and recall of the p...

Descripción completa

Detalles Bibliográficos
Autores principales: Niu, Yun, Otasek, David, Jurisica, Igor
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796811/
https://www.ncbi.nlm.nih.gov/pubmed/19850753
http://dx.doi.org/10.1093/bioinformatics/btp602
_version_ 1782175564312870912
author Niu, Yun
Otasek, David
Jurisica, Igor
author_facet Niu, Yun
Otasek, David
Jurisica, Igor
author_sort Niu, Yun
collection PubMed
description Motivation: Identification and characterization of protein–protein interactions (PPIs) is one of the key aims in biological research. While previous research in text mining has made substantial progress in automatic PPI detection from literature, the need to improve the precision and recall of the process remains. More accurate PPI detection will also improve the ability to extract experimental data related to PPIs and provide multiple evidence for each interaction. Results: We developed an interaction detection method and explored the usefulness of various features in automatically identifying PPIs in text. The results show that our approach outperforms other systems using the AImed dataset. In the tests where our system achieves better precision with reduced recall, we discuss possible approaches for improvement. In addition to test datasets, we evaluated the performance on interactions from five human-curated databases—BIND, DIP, HPRD, IntAct and MINT—where our system consistently identified evidence for ∼60% of interactions when both proteins appear in at least one sentence in the PubMed abstract. We then applied the system to extract articles from PubMed to annotate known, high-throughput and interologous interactions in I(2)D. Availability: The data and software are available at: http://www.cs.utoronto.ca/∼juris/data/BI09/. Contact: yniu@uhnres.utoronto.ca; juris@ai.utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2796811
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27968112009-12-23 Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D Niu, Yun Otasek, David Jurisica, Igor Bioinformatics Original Papers Motivation: Identification and characterization of protein–protein interactions (PPIs) is one of the key aims in biological research. While previous research in text mining has made substantial progress in automatic PPI detection from literature, the need to improve the precision and recall of the process remains. More accurate PPI detection will also improve the ability to extract experimental data related to PPIs and provide multiple evidence for each interaction. Results: We developed an interaction detection method and explored the usefulness of various features in automatically identifying PPIs in text. The results show that our approach outperforms other systems using the AImed dataset. In the tests where our system achieves better precision with reduced recall, we discuss possible approaches for improvement. In addition to test datasets, we evaluated the performance on interactions from five human-curated databases—BIND, DIP, HPRD, IntAct and MINT—where our system consistently identified evidence for ∼60% of interactions when both proteins appear in at least one sentence in the PubMed abstract. We then applied the system to extract articles from PubMed to annotate known, high-throughput and interologous interactions in I(2)D. Availability: The data and software are available at: http://www.cs.utoronto.ca/∼juris/data/BI09/. Contact: yniu@uhnres.utoronto.ca; juris@ai.utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-01-01 2009-10-22 /pmc/articles/PMC2796811/ /pubmed/19850753 http://dx.doi.org/10.1093/bioinformatics/btp602 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Niu, Yun
Otasek, David
Jurisica, Igor
Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title_full Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title_fullStr Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title_full_unstemmed Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title_short Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I(2)D
title_sort evaluation of linguistic features useful in extraction of interactions from pubmed; application to annotating known, high-throughput and predicted interactions in i(2)d
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796811/
https://www.ncbi.nlm.nih.gov/pubmed/19850753
http://dx.doi.org/10.1093/bioinformatics/btp602
work_keys_str_mv AT niuyun evaluationoflinguisticfeaturesusefulinextractionofinteractionsfrompubmedapplicationtoannotatingknownhighthroughputandpredictedinteractionsini2d
AT otasekdavid evaluationoflinguisticfeaturesusefulinextractionofinteractionsfrompubmedapplicationtoannotatingknownhighthroughputandpredictedinteractionsini2d
AT jurisicaigor evaluationoflinguisticfeaturesusefulinextractionofinteractionsfrompubmedapplicationtoannotatingknownhighthroughputandpredictedinteractionsini2d