Cargando…

LitGen: Genetic Literature Recommendation Guided by Human Explanations

As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Nie, Allen, Pineda, Arturo L., Wright, Matt W., Wand, Hannah, Wulf, Bryan, Costa, Helio, Patel, Ronak, Bustamante, Carlos D., Zou, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478937/
https://www.ncbi.nlm.nih.gov/pubmed/31797587
_version_ 1783580161504968704
author Nie, Allen
Pineda, Arturo L.
Wright, Matt W.
Wand, Hannah
Wulf, Bryan
Costa, Helio
Patel, Ronak
Bustamante, Carlos D.
Zou, James
author_facet Nie, Allen
Pineda, Arturo L.
Wright, Matt W.
Wand, Hannah
Wulf, Bryan
Costa, Helio
Patel, Ronak
Bustamante, Carlos D.
Zou, James
author_sort Nie, Allen
collection PubMed
description As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences—e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)—the flagship NIH program for clinical curation—we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation.
format Online
Article
Text
id pubmed-7478937
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-74789372020-09-09 LitGen: Genetic Literature Recommendation Guided by Human Explanations Nie, Allen Pineda, Arturo L. Wright, Matt W. Wand, Hannah Wulf, Bryan Costa, Helio Patel, Ronak Bustamante, Carlos D. Zou, James Pac Symp Biocomput Article As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences—e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)—the flagship NIH program for clinical curation—we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation. 2020 /pmc/articles/PMC7478937/ /pubmed/31797587 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Nie, Allen
Pineda, Arturo L.
Wright, Matt W.
Wand, Hannah
Wulf, Bryan
Costa, Helio
Patel, Ronak
Bustamante, Carlos D.
Zou, James
LitGen: Genetic Literature Recommendation Guided by Human Explanations
title LitGen: Genetic Literature Recommendation Guided by Human Explanations
title_full LitGen: Genetic Literature Recommendation Guided by Human Explanations
title_fullStr LitGen: Genetic Literature Recommendation Guided by Human Explanations
title_full_unstemmed LitGen: Genetic Literature Recommendation Guided by Human Explanations
title_short LitGen: Genetic Literature Recommendation Guided by Human Explanations
title_sort litgen: genetic literature recommendation guided by human explanations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478937/
https://www.ncbi.nlm.nih.gov/pubmed/31797587
work_keys_str_mv AT nieallen litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT pinedaarturol litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT wrightmattw litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT wandhannah litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT wulfbryan litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT costahelio litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT patelronak litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT bustamantecarlosd litgengeneticliteraturerecommendationguidedbyhumanexplanations
AT zoujames litgengeneticliteraturerecommendationguidedbyhumanexplanations