Cargando…
LitGen: Genetic Literature Recommendation Guided by Human Explanations
As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curatio...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478937/ https://www.ncbi.nlm.nih.gov/pubmed/31797587 |
_version_ | 1783580161504968704 |
---|---|
author | Nie, Allen Pineda, Arturo L. Wright, Matt W. Wand, Hannah Wulf, Bryan Costa, Helio Patel, Ronak Bustamante, Carlos D. Zou, James |
author_facet | Nie, Allen Pineda, Arturo L. Wright, Matt W. Wand, Hannah Wulf, Bryan Costa, Helio Patel, Ronak Bustamante, Carlos D. Zou, James |
author_sort | Nie, Allen |
collection | PubMed |
description | As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences—e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)—the flagship NIH program for clinical curation—we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation. |
format | Online Article Text |
id | pubmed-7478937 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-74789372020-09-09 LitGen: Genetic Literature Recommendation Guided by Human Explanations Nie, Allen Pineda, Arturo L. Wright, Matt W. Wand, Hannah Wulf, Bryan Costa, Helio Patel, Ronak Bustamante, Carlos D. Zou, James Pac Symp Biocomput Article As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences—e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)—the flagship NIH program for clinical curation—we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evidence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation. 2020 /pmc/articles/PMC7478937/ /pubmed/31797587 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License. |
spellingShingle | Article Nie, Allen Pineda, Arturo L. Wright, Matt W. Wand, Hannah Wulf, Bryan Costa, Helio Patel, Ronak Bustamante, Carlos D. Zou, James LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title | LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title_full | LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title_fullStr | LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title_full_unstemmed | LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title_short | LitGen: Genetic Literature Recommendation Guided by Human Explanations |
title_sort | litgen: genetic literature recommendation guided by human explanations |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7478937/ https://www.ncbi.nlm.nih.gov/pubmed/31797587 |
work_keys_str_mv | AT nieallen litgengeneticliteraturerecommendationguidedbyhumanexplanations AT pinedaarturol litgengeneticliteraturerecommendationguidedbyhumanexplanations AT wrightmattw litgengeneticliteraturerecommendationguidedbyhumanexplanations AT wandhannah litgengeneticliteraturerecommendationguidedbyhumanexplanations AT wulfbryan litgengeneticliteraturerecommendationguidedbyhumanexplanations AT costahelio litgengeneticliteraturerecommendationguidedbyhumanexplanations AT patelronak litgengeneticliteraturerecommendationguidedbyhumanexplanations AT bustamantecarlosd litgengeneticliteraturerecommendationguidedbyhumanexplanations AT zoujames litgengeneticliteraturerecommendationguidedbyhumanexplanations |