2024
From Examples to Patterns: LLM-Generated Reg-ular Expressions for Entity Extraction in Czech Clinical Texts
ZELINA, PetrZákladní údaje
Originální název
From Examples to Patterns: LLM-Generated Reg-ular Expressions for Entity Extraction in Czech Clinical Texts
Autoři
Vydání
Brno, Proceedings of the Eighteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2024, od s. 3-16, 14 s. 2024
Nakladatel
Tribun EU
Další údaje
Jazyk
angličtina
Typ výsledku
Stať ve sborníku
Obor
10200 1.2 Computer and information sciences
Stát vydavatele
Česká republika
Utajení
není předmětem státního či obchodního tajemství
Forma vydání
tištěná verze "print"
Odkazy
Označené pro přenos do RIV
Ano
Kód RIV
RIV/00216224:14330/24:00138709
Organizační jednotka
Fakulta informatiky
ISBN
978-80-263-1835-4
ISSN
Klíčová slova anglicky
NLP; LLM; regex; text mining; clinical notes
Příznaky
Recenzováno
Změněno: 30. 1. 2025 15:53, RNDr. Petr Zelina
Anotace
V originále
Entity extraction in clinical texts is essential for converting unstructured data in clinical notes into structured formats, facilitating large-scale analysis and clinical decision support. Traditional methods often rely on handcrafted regular expressions (regexes), which, while effective, demand significant time and specialized knowledge to create -- resources that healthcare professionals may lack. We introduce a novel approach leveraging large language models (LLMs) to automate regex generation for clinical entity extraction. Our method involves prompting LLMs to generate regex patterns from examples, followed by iterative refinement using a feedback loop. Despite regex limitations, this approach is practical for extracting frequently patterned information common in clinical texts, such as dates, specific data about medical procedures or event detection. Our experiments on Czech clinical notes show this method outperforms current SOTA genetic-programming-based methods for generating regular expression patterns from examples, especially when there are few of them.
Návaznosti
| MUNI/A/1590/2023, interní kód MU |
|