Cargando…
ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statem...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8313968/ https://www.ncbi.nlm.nih.gov/pubmed/34327299 http://dx.doi.org/10.3389/frma.2021.674205 |
_version_ | 1783729452408111104 |
---|---|
author | Hobbs, Elizabeth T. Goralski, Stephen M. Mitchell, Ashley Simpson, Andrew Leka, Dorjan Kotey, Emmanuel Sekira, Matt Munro, James B. Nadendla, Suvarna Jackson, Rebecca Gonzalez-Aguirre, Aitor Krallinger, Martin Giglio, Michelle Erill, Ivan |
author_facet | Hobbs, Elizabeth T. Goralski, Stephen M. Mitchell, Ashley Simpson, Andrew Leka, Dorjan Kotey, Emmanuel Sekira, Matt Munro, James B. Nadendla, Suvarna Jackson, Rebecca Gonzalez-Aguirre, Aitor Krallinger, Martin Giglio, Michelle Erill, Ivan |
author_sort | Hobbs, Elizabeth T. |
collection | PubMed |
description | Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology. |
format | Online Article Text |
id | pubmed-8313968 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-83139682021-07-28 ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts Hobbs, Elizabeth T. Goralski, Stephen M. Mitchell, Ashley Simpson, Andrew Leka, Dorjan Kotey, Emmanuel Sekira, Matt Munro, James B. Nadendla, Suvarna Jackson, Rebecca Gonzalez-Aguirre, Aitor Krallinger, Martin Giglio, Michelle Erill, Ivan Front Res Metr Anal Research Metrics and Analytics Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology. Frontiers Media S.A. 2021-07-13 /pmc/articles/PMC8313968/ /pubmed/34327299 http://dx.doi.org/10.3389/frma.2021.674205 Text en Copyright © 2021 Hobbs, Goralski, Mitchell, Simpson, Leka, Kotey, Sekira, Munro, Nadendla, Jackson, Gonzalez-Aguirre, Krallinger, Giglio and Erill. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Research Metrics and Analytics Hobbs, Elizabeth T. Goralski, Stephen M. Mitchell, Ashley Simpson, Andrew Leka, Dorjan Kotey, Emmanuel Sekira, Matt Munro, James B. Nadendla, Suvarna Jackson, Rebecca Gonzalez-Aguirre, Aitor Krallinger, Martin Giglio, Michelle Erill, Ivan ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title | ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_full | ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_fullStr | ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_full_unstemmed | ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_short | ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts |
title_sort | eco-collectf: a corpus of annotated evidence-based assertions in biomedical manuscripts |
topic | Research Metrics and Analytics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8313968/ https://www.ncbi.nlm.nih.gov/pubmed/34327299 http://dx.doi.org/10.3389/frma.2021.674205 |
work_keys_str_mv | AT hobbselizabetht ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT goralskistephenm ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT mitchellashley ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT simpsonandrew ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT lekadorjan ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT koteyemmanuel ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT sekiramatt ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT munrojamesb ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT nadendlasuvarna ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT jacksonrebecca ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT gonzalezaguirreaitor ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT krallingermartin ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT gigliomichelle ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts AT erillivan ecocollectfacorpusofannotatedevidencebasedassertionsinbiomedicalmanuscripts |