Cargando…

A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters

We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies...

Descripción completa

Detalles Bibliográficos
Autores principales: Richter-Pechanski, Phillip, Wiesenbach, Philipp, Schwab, Dominic M., Kiriakou, Christina, He, Mingyang, Allers, Michael M., Tiefenbacher, Anna S., Kunz, Nicola, Martynova, Anna, Spiller, Noemie, Mierisch, Julian, Borchert, Florian, Schwind, Charlotte, Frey, Norbert, Dieterich, Christoph, Geis, Nicolas A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10104831/
https://www.ncbi.nlm.nih.gov/pubmed/37059736
http://dx.doi.org/10.1038/s41597-023-02128-9
_version_ 1785026121033580544
author Richter-Pechanski, Phillip
Wiesenbach, Philipp
Schwab, Dominic M.
Kiriakou, Christina
He, Mingyang
Allers, Michael M.
Tiefenbacher, Anna S.
Kunz, Nicola
Martynova, Anna
Spiller, Noemie
Mierisch, Julian
Borchert, Florian
Schwind, Charlotte
Frey, Norbert
Dieterich, Christoph
Geis, Nicolas A.
author_facet Richter-Pechanski, Phillip
Wiesenbach, Philipp
Schwab, Dominic M.
Kiriakou, Christina
He, Mingyang
Allers, Michael M.
Tiefenbacher, Anna S.
Kunz, Nicola
Martynova, Anna
Spiller, Noemie
Mierisch, Julian
Borchert, Florian
Schwind, Charlotte
Frey, Norbert
Dieterich, Christoph
Geis, Nicolas A.
author_sort Richter-Pechanski, Phillip
collection PubMed
description We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts.
format Online
Article
Text
id pubmed-10104831
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-101048312023-04-16 A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters Richter-Pechanski, Phillip Wiesenbach, Philipp Schwab, Dominic M. Kiriakou, Christina He, Mingyang Allers, Michael M. Tiefenbacher, Anna S. Kunz, Nicola Martynova, Anna Spiller, Noemie Mierisch, Julian Borchert, Florian Schwind, Charlotte Frey, Norbert Dieterich, Christoph Geis, Nicolas A. Sci Data Data Descriptor We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor’s letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts. Nature Publishing Group UK 2023-04-14 /pmc/articles/PMC10104831/ /pubmed/37059736 http://dx.doi.org/10.1038/s41597-023-02128-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Richter-Pechanski, Phillip
Wiesenbach, Philipp
Schwab, Dominic M.
Kiriakou, Christina
He, Mingyang
Allers, Michael M.
Tiefenbacher, Anna S.
Kunz, Nicola
Martynova, Anna
Spiller, Noemie
Mierisch, Julian
Borchert, Florian
Schwind, Charlotte
Frey, Norbert
Dieterich, Christoph
Geis, Nicolas A.
A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title_full A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title_fullStr A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title_full_unstemmed A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title_short A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters
title_sort distributable german clinical corpus containing cardiovascular clinical routine doctor’s letters
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10104831/
https://www.ncbi.nlm.nih.gov/pubmed/37059736
http://dx.doi.org/10.1038/s41597-023-02128-9
work_keys_str_mv AT richterpechanskiphillip adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT wiesenbachphilipp adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT schwabdominicm adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT kiriakouchristina adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT hemingyang adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT allersmichaelm adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT tiefenbacherannas adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT kunznicola adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT martynovaanna adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT spillernoemie adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT mierischjulian adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT borchertflorian adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT schwindcharlotte adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT freynorbert adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT dieterichchristoph adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT geisnicolasa adistributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT richterpechanskiphillip distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT wiesenbachphilipp distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT schwabdominicm distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT kiriakouchristina distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT hemingyang distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT allersmichaelm distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT tiefenbacherannas distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT kunznicola distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT martynovaanna distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT spillernoemie distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT mierischjulian distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT borchertflorian distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT schwindcharlotte distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT freynorbert distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT dieterichchristoph distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters
AT geisnicolasa distributablegermanclinicalcorpuscontainingcardiovascularclinicalroutinedoctorsletters