Cargando…
A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10104831/ https://www.ncbi.nlm.nih.gov/pubmed/37059736 http://dx.doi.org/10.1038/s41597-023-02128-9 |
_version_ | 1785026121033580544 |
---|---|
author | Richter-Pechanski, Phillip Wiesenbach, Philipp Schwab, Dominic M. Kiriakou, Christina He, Mingyang Allers, Michael M. Tiefenbacher, Anna S. Kunz, Nicola Martynova, Anna Spiller, Noemie Mierisch, Julian Borchert, Florian Schwind, Charlotte Frey, Norbert Dieterich, Christoph Geis, Nicolas A. |
author_facet | Richter-Pechanski, Phillip Wiesenbach, Philipp Schwab, Dominic M. Kiriakou, Christina He, Mingyang Allers, Michael M. Tiefenbacher, Anna S. Kunz, Nicola Martynova, Anna Spiller, Noemie Mierisch, Julian Borchert, Florian Schwind, Charlotte Frey, Norbert Dieterich, Christoph Geis, Nicolas A. |
author_sort | Richter-Pechanski, Phillip |
collection | PubMed |
description | We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts. |
format | Online Article Text |
id | pubmed-10104831 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-101048312023-04-16 A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters Richter-Pechanski, Phillip Wiesenbach, Philipp Schwab, Dominic M. Kiriakou, Christina He, Mingyang Allers, Michael M. Tiefenbacher, Anna S. Kunz, Nicola Martynova, Anna Spiller, Noemie Mierisch, Julian Borchert, Florian Schwind, Charlotte Frey, Norbert Dieterich, Christoph Geis, Nicolas A. Sci Data Data Descriptor We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts. Nature Publishing Group UK 2023-04-14 /pmc/articles/PMC10104831/ /pubmed/37059736 http://dx.doi.org/10.1038/s41597-023-02128-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Richter-Pechanski, Phillip Wiesenbach, Philipp Schwab, Dominic M. Kiriakou, Christina He, Mingyang Allers, Michael M. Tiefenbacher, Anna S. Kunz, Nicola Martynova, Anna Spiller, Noemie Mierisch, Julian Borchert, Florian Schwind, Charlotte Frey, Norbert Dieterich, Christoph Geis, Nicolas A. A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title | A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title_full | A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title_fullStr | A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title_full_unstemmed | A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title_short | A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters |
title_sort | distributable german clinical corpus containing cardiovascular clinical routine doctor’s letters |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10104831/ https://www.ncbi.nlm.nih.gov/pubmed/37059736 http://dx.doi.org/10.1038/s41597-023-02128-9 |
work_keys_str_mv | AT richterpechanskiphillip adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT wiesenbachphilipp adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT schwabdominicm adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT kiriakouchristina adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT hemingyang adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT allersmichaelm adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT tiefenbacherannas adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT kunznicola adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT martynovaanna adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT spillernoemie adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT mierischjulian adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT borchertflorian adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT schwindcharlotte adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT freynorbert adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT dieterichchristoph adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT geisnicolasa adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT richterpechanskiphillip distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT wiesenbachphilipp distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT schwabdominicm distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT kiriakouchristina distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT hemingyang distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT allersmichaelm distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT tiefenbacherannas distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT kunznicola distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT martynovaanna distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT spillernoemie distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT mierischjulian distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT borchertflorian distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT schwindcharlotte distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT freynorbert distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT dieterichchristoph distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters AT geisnicolasa distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters |