Data Mining from Free-Text Health Records : State of the Art,
New Polish Corpus

D 2020

Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus

ANETTA, Krištof

Basic information

Original name

Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus

Authors

ANETTA, Krištof (703 Slovakia, guarantor, belonging to the institution)

Edition

Brno, Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020, p. 13-22, 10 pp. 2020

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

PDF ve sborníku Domovská stránka workshopu

RIV identification code

RIV/00216224:14330/20:00117842

Organization unit

Faculty of Informatics

ISBN

978-80-263-1600-8

ISSN

UT WoS

000655471300002

Keywords in English

EHR; electronic health records; named entity recognition; text data mining; NLP; natural language processing; Slavic languages; Polish

Abstract

V originále

This paper deals with data mining from free-form text electronic health records both from global perspective and with specific application to Slavic languages. It introduces the reader to the promises and challenges of this enterprise and provides a short overview of the global state of the art and of the general absence of this kind of research in Central European Slavic languages. It describes pl_ehr_cardio, a new corpus of Polish health records with 18 years’ worth of medical text. This paper marks the beginning of a pioneering research project in medical text data mining in Central European Slavic languages.

Links

LM2018101, research and development project

Name: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)

Investor: Ministry of Education, Youth and Sports of the CR

MUNI/A/1411/2019, interní kód MU

Name: Aplikovaný výzkum: softwarové architektury kritických infrastruktur, bezpečnost počítačových systémů, zpracování přirozeného jazyka a jazykové inženýrství, vizualizaci velkých dat a rozšířená realita.

Investor: Masaryk University, Category A

Citovat

ANETTA, Krištof. Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, p. 13-22. ISBN 978-80-263-1600-8.

@inproceedings{1729516,
   author = {Anetta, Krištof},
   address = {Brno},
   booktitle = {Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020},
   editor = {Aleš Horák},
   keywords = {EHR; electronic health records; named entity recognition; text data mining; NLP; natural language processing; Slavic languages; Polish},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   isbn = {978-80-263-1600-8},
   pages = {13-22},
   publisher = {Tribun EU},
   title = {Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus},
   url = {https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21},
   year = {2020}
}

TY  - JOUR
ID  - 1729516
AU  - Anetta, Krištof
PY  - 2020
TI  - Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus
PB  - Tribun EU
CY  - Brno
SN  - 9788026316008
KW  - EHR
KW  - electronic health records
KW  - named entity recognition
KW  - text data mining
KW  - NLP
KW  - natural language processing
KW  - Slavic languages
KW  - Polish
UR  - https://nlp.fi.muni.cz/raslan/raslan20.pdf#page=21
N2  - This paper deals with data mining from free-form text electronic health records both from global perspective and with specific application to Slavic languages. It introduces the reader to the promises and challenges of this enterprise and provides a short overview of the global state of the art and of the general absence of this kind of research in Central European Slavic languages. It describes pl_ehr_cardio, a new corpus of Polish health records with 18 years’ worth of medical text. This paper marks the beginning of a pioneering research project in medical text data mining in Central European Slavic languages.
ER  -

ANETTA, Krištof. Data Mining from Free-Text Health Records : State of the Art, New Polish Corpus. In Aleš Horák. \textit{Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020}. Brno: Tribun EU, 2020, p.~13-22. ISBN~978-80-263-1600-8.

Detailed Information on Publication Record