Other formats:
BibTeX
LaTeX
RIS
@inproceedings{1810617, author = {Ha, Hien Thi and Horák, Aleš}, address = {St. Petersburg, Russia}, booktitle = {SPECOM 2021: 23rd International Conference on Speech and Computer}, doi = {http://dx.doi.org/10.1007/978-3-030-87802-3_23}, editor = {Karpov A., Potapova R.}, keywords = {OCR; Invoice; Block type classification; Seller; Buyer; Delivery address}, howpublished = {elektronická verze "online"}, language = {eng}, location = {St. Petersburg, Russia}, isbn = {978-3-030-87801-6}, pages = {250-261}, publisher = {Springer}, title = {Who is Selling to Whom – Feature Evaluation for Multi-block Classification in Invoice Information Extraction}, url = {https://link.springer.com/chapter/10.1007/978-3-030-87802-3_23}, year = {2021} }
TY - JOUR ID - 1810617 AU - Ha, Hien Thi - Horák, Aleš PY - 2021 TI - Who is Selling to Whom – Feature Evaluation for Multi-block Classification in Invoice Information Extraction PB - Springer CY - St. Petersburg, Russia SN - 9783030878016 KW - OCR KW - Invoice KW - Block type classification KW - Seller KW - Buyer KW - Delivery address UR - https://link.springer.com/chapter/10.1007/978-3-030-87802-3_23 N2 - The invoice information extraction task aims at unifying the automatized processing of invoices in structured forms and in the form of a scanned image. Recognizing the pieces of information where a specific value is identified with a keyword (such as the invoice date) is a relatively well-managed task. On the other hand, identification of multi-block information on the invoice, such as distinguishing the seller, buyer, and the delivery address, is much more challenging due to versatile invoice layouts. In this work, we present a new technique of feature extraction and classification to recognize the seller, buyer, and delivery address text blocks in scanned invoices based on a combination of complex layout and annotated text features. The method does not only consider the block positional features but also the relation between blocks and block contents at a higher level. The technique is implemented as a module of the OCRMiner system. We offer its detailed evaluation and error analysis with a dataset of more than five hundred Czech invoices reaching the overall macro average F1-score of 94%. ER -
HA, Hien Thi and Aleš HORÁK. Who is Selling to Whom – Feature Evaluation for Multi-block Classification in Invoice Information Extraction. Online. In Karpov A., Potapova R. \textit{SPECOM 2021: 23rd International Conference on Speech and Computer}. St. Petersburg, Russia: Springer, 2021, p.~250-261. ISBN~978-3-030-87801-6. Available from: https://dx.doi.org/10.1007/978-3-030-87802-3\_{}23.
|