Cargando…

An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations

The BioCreative VI Track IV (mining protein interactions and mutations for precision medicine) challenge was organized in 2017 with the goal of applying biomedical text mining methods to support advancements in precision medicine approaches. As part of the challenge, a new dataset was introduced for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tran, Tung, Kavuluru, Ramakanth
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146129/ https://www.ncbi.nlm.nih.gov/pubmed/30239680 http://dx.doi.org/10.1093/database/bay092

_version_	1783356346454769664
author	Tran, Tung Kavuluru, Ramakanth
author_facet	Tran, Tung Kavuluru, Ramakanth
author_sort	Tran, Tung
collection	PubMed
description	The BioCreative VI Track IV (mining protein interactions and mutations for precision medicine) challenge was organized in 2017 with the goal of applying biomedical text mining methods to support advancements in precision medicine approaches. As part of the challenge, a new dataset was introduced for the purpose of building a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Specifically, such pairs represent proteins participating in a binary protein–protein interaction relation where the interaction is additionally affected by a genetic mutation—referred to as a PPIm relation. In this study, we explore an end-to-end approach for PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named-entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry that placed second when matching on Entrez Gene ID (exact matching) and on HomoloGene ID. On exact matching, the improved system achieved new competitive test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% that corresponds to an improvement from the prior best system by approximately three micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67 and 45.59%, respectively, corresponding to an improvement of more than eight micro-F1 points over the prior best result. The code for our deep learning system is made publicly available at https://github.com/bionlproc/biocppi_extraction.
format	Online Article Text
id	pubmed-6146129
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-61461292018-09-25 An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations Tran, Tung Kavuluru, Ramakanth Database (Oxford) Original Article The BioCreative VI Track IV (mining protein interactions and mutations for precision medicine) challenge was organized in 2017 with the goal of applying biomedical text mining methods to support advancements in precision medicine approaches. As part of the challenge, a new dataset was introduced for the purpose of building a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Specifically, such pairs represent proteins participating in a binary protein–protein interaction relation where the interaction is additionally affected by a genetic mutation—referred to as a PPIm relation. In this study, we explore an end-to-end approach for PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named-entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry that placed second when matching on Entrez Gene ID (exact matching) and on HomoloGene ID. On exact matching, the improved system achieved new competitive test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% that corresponds to an improvement from the prior best system by approximately three micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67 and 45.59%, respectively, corresponding to an improvement of more than eight micro-F1 points over the prior best result. The code for our deep learning system is made publicly available at https://github.com/bionlproc/biocppi_extraction. Oxford University Press 2018-09-18 /pmc/articles/PMC6146129/ /pubmed/30239680 http://dx.doi.org/10.1093/database/bay092 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Tran, Tung Kavuluru, Ramakanth An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title	An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title_full	An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title_fullStr	An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title_full_unstemmed	An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title_short	An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
title_sort	end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146129/ https://www.ncbi.nlm.nih.gov/pubmed/30239680 http://dx.doi.org/10.1093/database/bay092
work_keys_str_mv	AT trantung anendtoenddeeplearningarchitectureforextractingproteinproteininteractionsaffectedbygeneticmutations AT kavulururamakanth anendtoenddeeplearningarchitectureforextractingproteinproteininteractionsaffectedbygeneticmutations AT trantung endtoenddeeplearningarchitectureforextractingproteinproteininteractionsaffectedbygeneticmutations AT kavulururamakanth endtoenddeeplearningarchitectureforextractingproteinproteininteractionsaffectedbygeneticmutations

An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations

Ejemplares similares