Cargando…

Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering

Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lung, Pei-Yau, He, Zhe, Zhao, Tingting, Yu, Disa, Zhang, Jinfeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/ https://www.ncbi.nlm.nih.gov/pubmed/30624652 http://dx.doi.org/10.1093/database/bay138

_version_	1783385739533221888
author	Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng
author_facet	Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng
author_sort	Lung, Pei-Yau
collection	PubMed
description	Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning.
format	Online Article Text
id	pubmed-6323317
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-63233172019-01-10 Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng Database (Oxford) Original Article Information about the interactions between chemical compounds and proteins is indispensable for understanding the regulation of biological processes and the development of therapeutic drugs. Manually extracting such information from biomedical literature is very time and resource consuming. In this study, we propose a computational method to automatically extract chemical–protein interactions (CPIs) from a given text. Our method extracts CPI pairs and CPI triplets from sentences, where a CPI pair consists of a chemical compound and a protein name, and a CPI triplet consists of a CPI pair along with an interaction word describing their relationship. We extracted a diverse set of features from sentences that were used to build multiple machine learning models. Our models contain both simple features, which can be directly computed from sentences, and more sophisticated features derived using sentence structure analysis techniques. For example, one set of features was extracted based on the shortest paths between the CPI pairs or among the CPI triplets in the dependency graphs obtained from sentence parsing. We designed a three-stage approach to predict the multiple categories of CPIs. Our method performed the best among systems that use non-deep learning methods and outperformed several deep-learning-based systems in the track 5 of the BioCreative VI challenge. The features we designed in this study are informative and can be applied to other machine learning methods including deep learning. Oxford University Press 2019-01-08 /pmc/articles/PMC6323317/ /pubmed/30624652 http://dx.doi.org/10.1093/database/bay138 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Lung, Pei-Yau He, Zhe Zhao, Tingting Yu, Disa Zhang, Jinfeng Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title	Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_full	Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_fullStr	Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_full_unstemmed	Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_short	Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
title_sort	extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323317/ https://www.ncbi.nlm.nih.gov/pubmed/30624652 http://dx.doi.org/10.1093/database/bay138
work_keys_str_mv	AT lungpeiyau extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT hezhe extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT zhaotingting extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT yudisa extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering AT zhangjinfeng extractingchemicalproteininteractionsfromliteratureusingsentencestructureanalysisandfeatureengineering

Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering

Ejemplares similares