Approximate String Matching for Detecting Keywords in Scanned
Business Documents

HA, Hien Thi. Approximate String Matching for Detecting Keywords in Scanned Business Documents. Online. In Ales Horak, Pavel Rychly, Adam Rambousek. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019. Brno, Czech Republic: NLP Consulting, 2019, p. 49-54. ISBN 978-80-263-1530-8.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Approximate String Matching for Detecting Keywords in Scanned Business Documents
Authors	HA, Hien Thi (704 Viet Nam, guarantor, belonging to the institution).
Edition	Brno, Czech Republic, Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 49-54, 6 pp. 2019.
Publisher	NLP Consulting

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	electronic version available online
WWW	URL
RIV identification code	RIV/00216224:14330/19:00113733
Organization unit	Faculty of Informatics
ISBN	978-80-263-1530-8
ISSN	2336-4289
UT WoS	000604899800006
Keywords in English	approximate string matching; Levenshtein distance; weighted edit distance; OCR; invoice
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 01:32.

Abstract
Optical Character Recognition (OCR) is achieving higher ac- curacy. However, to decrease error rate down to zero is still a human desire. This paper presents an approximate string matching method using weighted edit distance for searching keywords in OCR-ed business docu- ments. The evaluation on a Czech invoice dataset shows that the method can detect a significant part of erroneous keywords.

PrintDisplayed: 27/5/2024 02:24

Approximate String Matching for Detecting Keywords in Scanned Business Documents

Other applications