HA, Hien Thi. Approximate String Matching for Detecting Keywords in Scanned Business Documents. Online. In Ales Horak, Pavel Rychly, Adam Rambousek. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019. Brno, Czech Republic: NLP Consulting, 2019, p. 49-54. ISBN 978-80-263-1530-8.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Approximate String Matching for Detecting Keywords in Scanned Business Documents
Authors HA, Hien Thi (704 Viet Nam, guarantor, belonging to the institution).
Edition Brno, Czech Republic, Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 49-54, 6 pp. 2019.
Publisher NLP Consulting
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
RIV identification code RIV/00216224:14330/19:00113733
Organization unit Faculty of Informatics
ISBN 978-80-263-1530-8
ISSN 2336-4289
UT WoS 000604899800006
Keywords in English approximate string matching; Levenshtein distance; weighted edit distance; OCR; invoice
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 01:32.
Abstract
Optical Character Recognition (OCR) is achieving higher ac- curacy. However, to decrease error rate down to zero is still a human desire. This paper presents an approximate string matching method using weighted edit distance for searching keywords in OCR-ed business docu- ments. The evaluation on a Czech invoice dataset shows that the method can detect a significant part of erroneous keywords.
PrintDisplayed: 27/5/2024 02:24