Cargando…

AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature

PURPOSE: Both monogenic pathogenic variant cataloging, and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic appro...

Descripción completa

Detalles Bibliográficos
Autores principales: Birgmeier, Johannes, Deisseroth, Cole A., Hayward, Laura E., Galhardo, Luisa M. T., Tierno, Andrew P., Jagadeesh, Karthik A., Stenson, Peter D., Cooper, David N., Bernstein, Jonathan A., Haeussler, Maximilian, Bejerano, Gill
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7301356/
https://www.ncbi.nlm.nih.gov/pubmed/31467448
http://dx.doi.org/10.1038/s41436-019-0643-6
_version_ 1783547673329008640
author Birgmeier, Johannes
Deisseroth, Cole A.
Hayward, Laura E.
Galhardo, Luisa M. T.
Tierno, Andrew P.
Jagadeesh, Karthik A.
Stenson, Peter D.
Cooper, David N.
Bernstein, Jonathan A.
Haeussler, Maximilian
Bejerano, Gill
author_facet Birgmeier, Johannes
Deisseroth, Cole A.
Hayward, Laura E.
Galhardo, Luisa M. T.
Tierno, Andrew P.
Jagadeesh, Karthik A.
Stenson, Peter D.
Cooper, David N.
Bernstein, Jonathan A.
Haeussler, Maximilian
Bejerano, Gill
author_sort Birgmeier, Johannes
collection PubMed
description PURPOSE: Both monogenic pathogenic variant cataloging, and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: AVADA (Automatic Variant evidence DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full text primary literature about monogenic disease and convert them to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in HGMD, a 4.4x-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD, but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar’s 21, vs. only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.
format Online
Article
Text
id pubmed-7301356
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-73013562020-06-18 AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature Birgmeier, Johannes Deisseroth, Cole A. Hayward, Laura E. Galhardo, Luisa M. T. Tierno, Andrew P. Jagadeesh, Karthik A. Stenson, Peter D. Cooper, David N. Bernstein, Jonathan A. Haeussler, Maximilian Bejerano, Gill Genet Med Article PURPOSE: Both monogenic pathogenic variant cataloging, and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: AVADA (Automatic Variant evidence DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full text primary literature about monogenic disease and convert them to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in HGMD, a 4.4x-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD, but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar’s 21, vs. only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis. 2019-08-30 2020-02 /pmc/articles/PMC7301356/ /pubmed/31467448 http://dx.doi.org/10.1038/s41436-019-0643-6 Text en http://www.nature.com/authors/editorial_policies/license.html#terms Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Birgmeier, Johannes
Deisseroth, Cole A.
Hayward, Laura E.
Galhardo, Luisa M. T.
Tierno, Andrew P.
Jagadeesh, Karthik A.
Stenson, Peter D.
Cooper, David N.
Bernstein, Jonathan A.
Haeussler, Maximilian
Bejerano, Gill
AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title_full AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title_fullStr AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title_full_unstemmed AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title_short AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature
title_sort avada: towards automated pathogenic variant evidence retrieval directly from the full text literature
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7301356/
https://www.ncbi.nlm.nih.gov/pubmed/31467448
http://dx.doi.org/10.1038/s41436-019-0643-6
work_keys_str_mv AT birgmeierjohannes avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT deisserothcolea avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT haywardlaurae avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT galhardoluisamt avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT tiernoandrewp avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT jagadeeshkarthika avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT stensonpeterd avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT cooperdavidn avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT bernsteinjonathana avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT haeusslermaximilian avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature
AT bejeranogill avadatowardsautomatedpathogenicvariantevidenceretrievaldirectlyfromthefulltextliterature