Cargando…

Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering

Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this...

Descripción completa

Detalles Bibliográficos
Autores principales: Lung, Pei-Yau, He, Zhe, Zhao, Tingting, Yu, Disa, Zhang, Jinfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/
https://www.ncbi.nlm.nih.gov/pubmed/30624652
http://dx.doi.org/10.1093/database/bay138
_version_ 1783385739533221888
author Lung, Pei-Yau
He, Zhe
Zhao, Tingting
Yu, Disa
Zhang, Jinfeng
author_facet Lung, Pei-Yau
He, Zhe
Zhao, Tingting
Yu, Disa
Zhang, Jinfeng
author_sort Lung, Pei-Yau
collection PubMed
description Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.
format Online
Article
Text
id pubmed-6323317
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63233172019-01-10 Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng Database (Oxford) Original Article Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning. Oxford University Press 2019-01-08 /pmc/articles/PMC6323317/ /pubmed/30624652 http://dx.doi.org/10.1093/database/bay138 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Lung, Pei-Yau
He, Zhe
Zhao, Tingting
Yu, Disa
Zhang, Jinfeng
Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_full Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_fullStr Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_full_unstemmed Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_short Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_sort extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/
https://www.ncbi.nlm.nih.gov/pubmed/30624652
http://dx.doi.org/10.1093/database/bay138
work_keys_str_mv AT lungpeiyau extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering
AT hezhe extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering
AT zhaotingting extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering
AT yudisa extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering
AT zhangjinfeng extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering