Data Mining from Free-Text Health Records : State of the Art,
New Polish Corpus

ANETTA, Krištof. Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 13-22. ISBN 978-80-263-1600-8.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus
Authors	ANETTA, Krištof (703 Slovakia, guarantor, belonging to the institution).
Edition	Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 13-22, 10 pp. 2020.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	PDF ve sborníku Domovská stránka workshopu
RIV identification code	RIV/00216224:14330/20:00117842
Organization unit	Faculty of Informatics
ISBN	978-80-263-1600-8
ISSN	2336-4289
UT WoS	000655471300002
Keywords in English	EHR; electronic health records; named entity recognition; text data mining; NLP; natural language processing; Slavic languages; Polish
Tags	named entity recognition, natural language processing, NLP, polish, Slavic languages, text data mining
Tags	International impact
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 13/5/2024 17:46.

Abstract

This paper deals with data mining from free-form text electronic health records both from global perspective and with specific application to Slavic languages. It introduces the reader to the promises and challenges of this enterprise and provides a short overview of the global state of the art and of the general absence of this kind of research in Central European Slavic languages. It describes pl_ehr_cardio, a new corpus of Polish health records with 18 years’ worth of medical text. This paper marks the beginning of a pioneering research project in medical text data mining in Central European Slavic languages.

Links
LM2018101, research and development project	Name: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
LM2018101, research and development project	Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/1411/2019, interní kód MU	Name: Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.
MUNI/A/1411/2019, interní kód MU	Investor: Masaryk University, Category A

PrintDisplayed: 4/9/2024 22:18

Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus

Other applications