Transferability of General Polish NER to Electronic Health
Records

ANETTA, Krištof a Mahmut ARSLAN. Transferability of General Polish NER to Electronic Health Records. In Horák, Rychlý, Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, s. 151-159. ISBN 978-80-263-1670-1.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Transferability of General Polish NER to Electronic Health Records
Autoři	ANETTA, Krištof (203 Česká republika, garant, domácí) a Mahmut ARSLAN (792 Turecko).
Vydání	Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2021), od s. 151-159, 9 s. 2021.
Nakladatel	Tribun EU

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10200 1.2 Computer and information sciences
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	tištěná verze "print"
WWW	Domovská stránka workshopu Full text PDF
Kód RIV	RIV/00216224:14330/21:00123253
Organizační jednotka	Fakulta informatiky
ISBN	978-80-263-1670-1
ISSN	2336-4289
Klíčová slova anglicky	EHR; Electronic health records; Healthcare texts; NER; Named entity recognition; NLP; Natural language processing; Slavic languages; Polish; PolDeepNer2; spaCy; Spark NLP
Změnil	Změnil: RNDr. Pavel Šmerk, Ph.D., učo 3880. Změněno: 15. 5. 2024 10:23.

Anotace

This paper investigates the transferability of general Polish named entity recognition tools to the analysis of Polish health records. The tools, namely PolDeepNer2, spaCy’s pl_core_news_lg pipeline and Spark NLP’s entity_recognizer_md pipeline for Polish, were run on the pl_ehr_cardio corpus and their results were analyzed, paying special atten- tion to their performance when processing these highly specific texts and to the applicability of the results in the healthcare domain. Even though the precision of PolDeepNer2 proved to be superior to both spaCy and Spark NLP, the paper concludes that without additional training, general named entity recognition tools for Polish have very limited use in the medi- cal analysis of electronic health records. However, they could be helpful in partial tasks ranging from de-identification to entity disambiguation and discovery of mistyped entities or candidate entities that are not present in medical dictionaries.

Návaznosti
LM2018101, projekt VaV	Název: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Akronym: LINDAT/CLARIAH-CZ)
LM2018101, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, LINDAT/CLARIAH-CZ - Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
MUNI/IGA/1505/2020, interní kód MU	Název: Electronic Health Record Analysis using Deep Learning (Akronym: Health Record Analysis with Deep Learning)
MUNI/IGA/1505/2020, interní kód MU	Investor: Masarykova univerzita, Electronic Health Record Analysis using Deep Learning

VytisknoutZobrazeno: 23. 7. 2024 20:18

Transferability of General Polish NER to Electronic Health Records

Další aplikace