Cargando…

Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three exam...

Descripción completa

Detalles Bibliográficos
Autores principales: Cunningham, Hamish, Tablan, Valentin, Roberts, Angus, Bontcheva, Kalina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567135/
https://www.ncbi.nlm.nih.gov/pubmed/23408875
http://dx.doi.org/10.1371/journal.pcbi.1002854
_version_ 1782258670652882944
author Cunningham, Hamish
Tablan, Valentin
Roberts, Angus
Bontcheva, Kalina
author_facet Cunningham, Hamish
Tablan, Valentin
Roberts, Angus
Bontcheva, Kalina
author_sort Cunningham, Hamish
collection PubMed
description This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
format Online
Article
Text
id pubmed-3567135
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35671352013-02-13 Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics Cunningham, Hamish Tablan, Valentin Roberts, Angus Bontcheva, Kalina PLoS Comput Biol Research Article This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis. Public Library of Science 2013-02-07 /pmc/articles/PMC3567135/ /pubmed/23408875 http://dx.doi.org/10.1371/journal.pcbi.1002854 Text en © 2013 Cunningham et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Cunningham, Hamish
Tablan, Valentin
Roberts, Angus
Bontcheva, Kalina
Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title_full Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title_fullStr Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title_full_unstemmed Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title_short Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
title_sort getting more out of biomedical documents with gate's full lifecycle open source text analytics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567135/
https://www.ncbi.nlm.nih.gov/pubmed/23408875
http://dx.doi.org/10.1371/journal.pcbi.1002854
work_keys_str_mv AT cunninghamhamish gettingmoreoutofbiomedicaldocumentswithgatesfulllifecycleopensourcetextanalytics
AT tablanvalentin gettingmoreoutofbiomedicaldocumentswithgatesfulllifecycleopensourcetextanalytics
AT robertsangus gettingmoreoutofbiomedicaldocumentswithgatesfulllifecycleopensourcetextanalytics
AT bontchevakalina gettingmoreoutofbiomedicaldocumentswithgatesfulllifecycleopensourcetextanalytics