D
2019
Approximate String Matching for Detecting Keywords in Scanned Business Documents
HA, Hien Thi
Basic information
Original name
Approximate String Matching for Detecting Keywords in Scanned Business Documents
Authors
HA, Hien Thi (704 Viet Nam, guarantor, belonging to the institution)
Edition
Brno, Czech Republic, Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 49-54, 6 pp. 2019
Other information
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
RIV identification code
RIV/00216224:14330/19:00113733
Organization unit
Faculty of Informatics
Keywords in English
approximate string matching; Levenshtein distance; weighted edit distance; OCR; invoice
Tags
International impact, Reviewed
V originále
Optical Character Recognition (OCR) is achieving higher ac- curacy. However, to decrease error rate down to zero is still a human desire. This paper presents an approximate string matching method using weighted edit distance for searching keywords in OCR-ed business docu- ments. The evaluation on a Czech invoice dataset shows that the method can detect a significant part of erroneous keywords.
Displayed: 4/11/2024 05:58