Cargando…
Accelerated variant curation from scientific literature using biomedical text mining
Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have dev...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Caltech Library
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160977/ https://www.ncbi.nlm.nih.gov/pubmed/35663412 http://dx.doi.org/10.17912/micropub.biology.000578 |
_version_ | 1784719386265780224 |
---|---|
author | Mallick, Rishab Arnaboldi, Valerio Davis, Paul Diamantakis, Stavros Zarowiecki, Magdalena Howe, Kevin |
author_facet | Mallick, Rishab Arnaboldi, Valerio Davis, Paul Diamantakis, Stavros Zarowiecki, Magdalena Howe, Kevin |
author_sort | Mallick, Rishab |
collection | PubMed |
description | Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have developed a hybrid approach that combines regular expressions, Named Entity Recognition based on BERT (Bidirectional Encoder Representations from Transformers) and bag-of-words to extract variant genomic locations from C. elegans papers for WormBase. Our model has a precision of 82.59% for the gene-mutation matches tested on extracted text from 100 papers, and even recovers some data not discovered during manual curation. Code at: https://github.com/WormBase/genomic-info-from-papers |
format | Online Article Text |
id | pubmed-9160977 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Caltech Library |
record_format | MEDLINE/PubMed |
spelling | pubmed-91609772022-06-03 Accelerated variant curation from scientific literature using biomedical text mining Mallick, Rishab Arnaboldi, Valerio Davis, Paul Diamantakis, Stavros Zarowiecki, Magdalena Howe, Kevin MicroPubl Biol New Methods Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have developed a hybrid approach that combines regular expressions, Named Entity Recognition based on BERT (Bidirectional Encoder Representations from Transformers) and bag-of-words to extract variant genomic locations from C. elegans papers for WormBase. Our model has a precision of 82.59% for the gene-mutation matches tested on extracted text from 100 papers, and even recovers some data not discovered during manual curation. Code at: https://github.com/WormBase/genomic-info-from-papers Caltech Library 2022-06-01 /pmc/articles/PMC9160977/ /pubmed/35663412 http://dx.doi.org/10.17912/micropub.biology.000578 Text en Copyright: © 2022 by the authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | New Methods Mallick, Rishab Arnaboldi, Valerio Davis, Paul Diamantakis, Stavros Zarowiecki, Magdalena Howe, Kevin Accelerated variant curation from scientific literature using biomedical text mining |
title | Accelerated variant curation from scientific literature using biomedical text mining |
title_full | Accelerated variant curation from scientific literature using biomedical text mining |
title_fullStr | Accelerated variant curation from scientific literature using biomedical text mining |
title_full_unstemmed | Accelerated variant curation from scientific literature using biomedical text mining |
title_short | Accelerated variant curation from scientific literature using biomedical text mining |
title_sort | accelerated variant curation from scientific literature using biomedical text mining |
topic | New Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160977/ https://www.ncbi.nlm.nih.gov/pubmed/35663412 http://dx.doi.org/10.17912/micropub.biology.000578 |
work_keys_str_mv | AT mallickrishab acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining AT arnaboldivalerio acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining AT davispaul acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining AT diamantakisstavros acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining AT zarowieckimagdalena acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining AT howekevin acceleratedvariantcurationfromscientificliteratureusingbiomedicaltextmining |