Approximate String Matching for Detecting Keywords in Scanned
Business Documents

HA, Hien Thi. Approximate String Matching for Detecting Keywords in Scanned Business Documents. Online. In Ales Horak, Pavel Rychly, Adam Rambousek. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019. Brno, Czech Republic: NLP Consulting, 2019, s. 49-54. ISBN 978-80-263-1530-8.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Approximate String Matching for Detecting Keywords in Scanned Business Documents
Autoři	HA, Hien Thi (704 Vietnam, garant, domácí).
Vydání	Brno, Czech Republic, Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2019, od s. 49-54, 6 s. 2019.
Nakladatel	NLP Consulting

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	elektronická verze "online"
WWW	URL
Kód RIV	RIV/00216224:14330/19:00113733
Organizační jednotka	Fakulta informatiky
ISBN	978-80-263-1530-8
ISSN	2336-4289
UT WoS	000604899800006
Klíčová slova anglicky	approximate string matching; Levenshtein distance; weighted edit distance; OCR; invoice
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: Mgr. Michal Petr, učo 65024. Změněno: 16. 5. 2022 15:21.

Anotace
Optical Character Recognition (OCR) is achieving higher ac- curacy. However, to decrease error rate down to zero is still a human desire. This paper presents an approximate string matching method using weighted edit distance for searching keywords in OCR-ed business docu- ments. The evaluation on a Czech invoice dataset shows that the method can detect a significant part of erroneous keywords.

VytisknoutZobrazeno: 14. 5. 2024 01:53

Approximate String Matching for Detecting Keywords in Scanned Business Documents

Další aplikace