Cargando…

Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experience...

Descripción completa

Detalles Bibliográficos
Autores principales: Neves, Mariana, Damaschun, Alexander, Mah, Nancy, Lekschas, Fritz, Seltmann, Stefanie, Stachelscheid, Harald, Fontaine, Jean-Fred, Kurtz, Andreas, Leser, Ulf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3629873/
https://www.ncbi.nlm.nih.gov/pubmed/23599415
http://dx.doi.org/10.1093/database/bat020
_version_ 1782266634221649920
author Neves, Mariana
Damaschun, Alexander
Mah, Nancy
Lekschas, Fritz
Seltmann, Stefanie
Stachelscheid, Harald
Fontaine, Jean-Fred
Kurtz, Andreas
Leser, Ulf
author_facet Neves, Mariana
Damaschun, Alexander
Mah, Nancy
Lekschas, Fritz
Seltmann, Stefanie
Stachelscheid, Harald
Fontaine, Jean-Fred
Kurtz, Andreas
Leser, Ulf
author_sort Neves, Mariana
collection PubMed
description Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/
format Online
Article
Text
id pubmed-3629873
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36298732013-04-18 Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts Neves, Mariana Damaschun, Alexander Mah, Nancy Lekschas, Fritz Seltmann, Stefanie Stachelscheid, Harald Fontaine, Jean-Fred Kurtz, Andreas Leser, Ulf Database (Oxford) Original Article Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/ Oxford University Press 2013-04-18 /pmc/articles/PMC3629873/ /pubmed/23599415 http://dx.doi.org/10.1093/database/bat020 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Article
Neves, Mariana
Damaschun, Alexander
Mah, Nancy
Lekschas, Fritz
Seltmann, Stefanie
Stachelscheid, Harald
Fontaine, Jean-Fred
Kurtz, Andreas
Leser, Ulf
Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title_full Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title_fullStr Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title_full_unstemmed Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title_short Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
title_sort preliminary evaluation of the cellfinder literature curation pipeline for gene expression in kidney cells and anatomical parts
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3629873/
https://www.ncbi.nlm.nih.gov/pubmed/23599415
http://dx.doi.org/10.1093/database/bat020
work_keys_str_mv AT nevesmariana preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT damaschunalexander preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT mahnancy preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT lekschasfritz preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT seltmannstefanie preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT stachelscheidharald preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT fontainejeanfred preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT kurtzandreas preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts
AT leserulf preliminaryevaluationofthecellfinderliteraturecurationpipelineforgeneexpressioninkidneycellsandanatomicalparts