Detailed Information on Publication Record
2019
Structured Information Extraction from Pharmaceutical Records
BAMBUROVÁ, Michaela and Zuzana NEVĚŘILOVÁBasic information
Original name
Structured Information Extraction from Pharmaceutical Records
Authors
BAMBUROVÁ, Michaela (703 Slovakia, belonging to the institution) and Zuzana NEVĚŘILOVÁ (203 Czech Republic, belonging to the institution)
Edition
Brno, Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 55-62, 8 pp. 2019
Publisher
Tribun EU
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10200 1.2 Computer and information sciences
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
RIV identification code
RIV/00216224:14330/19:00111627
Organization unit
Faculty of Informatics
ISBN
978-80-263-1530-8
ISSN
UT WoS
000604899800007
Keywords in English
structured information extraction; table understanding; entity recognition
Změněno: 16/5/2022 15:23, Mgr. Michal Petr
Abstract
V originále
The paper presents an iterative approach to understanding semi-structured or unstructured tabular data with pharmaceutical records. Thetask is to split records with entities such as drug name, dosage strength,dosage form, and package size into the appropriate columns. The data isprovided by many suppliers, and so it is very diverse in terms of structure.Some of the records are easy to parse using regular expressions; othersare difficult and need advanced methods. We used regular expressionsfor the easy-to-parse data and conditional random fields for the morecomplex records. We iteratively extend the training data set using theabove methods together with manual corrections. Currently, the F1 scorefor correct classification into 5 classes is 95%.
Links
EF16_013/0001781, research and development project |
| ||
LM2015071, research and development project |
|