Cargando…
Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/ https://www.ncbi.nlm.nih.gov/pubmed/30624652 http://dx.doi.org/10.1093/database/bay138 |
_version_ | 1783385739533221888 |
---|---|
author | Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng |
author_facet | Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng |
author_sort | Lung, Pei-Yau |
collection | PubMed |
description | Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning. |
format | Online Article Text |
id | pubmed-6323317 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63233172019-01-10 Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng Database (Oxford) Original Article Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning. Oxford University Press 2019-01-08 /pmc/articles/PMC6323317/ /pubmed/30624652 http://dx.doi.org/10.1093/database/bay138 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title | Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title_full | Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title_fullStr | Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title_full_unstemmed | Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title_short | Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
title_sort | extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/ https://www.ncbi.nlm.nih.gov/pubmed/30624652 http://dx.doi.org/10.1093/database/bay138 |
work_keys_str_mv | AT lungpeiyau extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT hezhe extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT zhaotingting extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT yudisa extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT zhangjinfeng extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering |