Extraction, labeling, clustering, and semantic mapping of
segments from clinical notes

J 2023

Extraction, labeling, clustering, and semantic mapping of segments from clinical notes

ZELINA, Petr, Jana HALÁMKOVÁ and Vít NOVÁČEK

Basic information

Original name

Extraction, labeling, clustering, and semantic mapping of segments from clinical notes

Authors

ZELINA, Petr (203 Czech Republic, guarantor, belonging to the institution), Jana HALÁMKOVÁ (203 Czech Republic, belonging to the institution) and Vít NOVÁČEK (203 Czech Republic, belonging to the institution)

Edition

IEEE TRANSACTIONS ON NANOBIOSCIENCE, UNITED STATES, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2023, 1536-1241

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

Impact factor

Impact factor: 3.900 in 2022

RIV identification code

RIV/00216224:14330/23:00131334

Organization unit

Faculty of Informatics

DOI

http://dx.doi.org/10.1109/TNB.2023.3275195

UT WoS

001082250700011

Keywords in English

NLP; EHR; Clinical Notes; Information Extraction; Text Classification

Abstract

ORIG CZ

V originále

This work is motivated by the scarcity of tools for accurate, unsupervised information extraction from unstructured clinical notes in computationally underrepresented languages, such as Czech. We introduce a stepping stone to a broad array of downstream tasks such as summarisation or integration of individual patient records, extraction of structured information for national cancer registry reporting or building of semi-structured semantic patient representations that can be used for computing patient embeddings. More specifically, we present a method for unsupervised extraction of semantically-labelled textual segments from clinical notes and test it out on a dataset of Czech breast cancer patients, provided by Masaryk Memorial Cancer Institute (the largest Czech hospital specialising exclusively in oncology). Our goal was to extract, classify (i.e. label) and cluster segments of the free-text notes that correspond to specific clinical features (e.g., family background, comorbidities or toxicities). Finally, we propose a tool for computer-assisted semantic mapping of segment types to pre-defined ontologies and validate it on a downstream task of category-specific patient similarity. The presented results demonstrate the practical relevance of the proposed approach for building more sophisticated extraction and analytical pipelines deployed on Czech clinical notes.

In Czech

This work is motivated by the scarcity of tools for accurate, unsupervised information extraction from unstructured clinical notes in computationally underrepresented languages, such as Czech. We introduce a stepping stone to a broad array of downstream tasks such as summarisation or integration of individual patient records, extraction of structured information for national cancer registry reporting or building of semi-structured semantic patient representations that can be used for computing patient embeddings. More specifically, we present a method for unsupervised extraction of semantically-labelled textual segments from clinical notes and test it out on a dataset of Czech breast cancer patients, provided by Masaryk Memorial Cancer Institute (the largest Czech hospital specialising exclusively in oncology). Our goal was to extract, classify (i.e. label) and cluster segments of the free-text notes that correspond to specific clinical features (e.g., family background, comorbidities or toxicities). Finally, we propose a tool for computer-assisted semantic mapping of segment types to pre-defined ontologies and validate it on a downstream task of category-specific patient similarity. The presented results demonstrate the practical relevance of the proposed approach for building more sophisticated extraction and analytical pipelines deployed on Czech clinical notes.

Links

MUNI/A/1339/2022, interní kód MU

Name: Rozvoj technik pro zpracování dat pro podporu vyhledávání, analýz a vizualizací rozsáhlých datových souborů s využitím umělé inteligence

Investor: Masaryk University, Development of data processing techniques to support search, analysis and visualization of large datasets using artificial intelligence

MUNI/G/1763/2020, interní kód MU

Name: AIcope - AI support for Clinical Oncology and Patient Empowerment (Acronym: AIcope)

Investor: Masaryk University, INTERDISCIPLINARY - Interdisciplinary research projects

Detailed Information on Publication Record