D 2019

Approximate String Matching for Detecting Keywords in Scanned Business Documents

HA, Hien Thi

Basic information

Original name

Approximate String Matching for Detecting Keywords in Scanned Business Documents

Authors

HA, Hien Thi (704 Viet Nam, guarantor, belonging to the institution)

Edition

Brno, Czech Republic, Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, p. 49-54, 6 pp. 2019

Publisher

NLP Consulting

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

URL

RIV identification code

RIV/00216224:14330/19:00113733

Organization unit

Faculty of Informatics

ISBN

978-80-263-1530-8

ISSN

UT WoS

000604899800006

Keywords in English

approximate string matching; Levenshtein distance; weighted edit distance; OCR; invoice

Tags

International impact, Reviewed
Změněno: 15/5/2024 01:32, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Optical Character Recognition (OCR) is achieving higher ac- curacy. However, to decrease error rate down to zero is still a human desire. This paper presents an approximate string matching method using weighted edit distance for searching keywords in OCR-ed business docu- ments. The evaluation on a Czech invoice dataset shows that the method can detect a significant part of erroneous keywords.
Displayed: 4/11/2024 05:58