D 2022

Information Extraction from Business Documents

GELETKA, Martin, Mikuláš BANKOVIČ, Dávid MELUŠ, Šárka ŠČAVNICKÁ, Michal ŠTEFÁNIK et. al.

Basic information

Original name

Information Extraction from Business Documents

Authors

GELETKA, Martin (703 Slovakia, guarantor, belonging to the institution), Mikuláš BANKOVIČ (703 Slovakia, belonging to the institution), Dávid MELUŠ (703 Slovakia, belonging to the institution), Šárka ŠČAVNICKÁ (703 Slovakia, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution)

Edition

Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2022), p. 35-46, 12 pp. 2022

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/22:00127213

Organization unit

Faculty of Informatics

ISBN

978-80-263-1752-4

ISSN

Keywords (in Czech)

OCR; multimodální učení; extrakce informací; transformery; strukturované dokumenty

Keywords in English

OCR; Multi-modal learning; Information extraction; Transformers; Structured Documents

Tags

International impact
Změněno: 15/5/2024 09:51, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Document AI is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Nowadays, many companies extract data from business documents through manual efforts that are time-consuming and expensive, requiring manual customization or configuration. This paper describes techniques to address these problems, apply them to real-world data, and implement them to an end-to-end solution for automatic information extraction from business documents.

Links

CZ.01.1.02/0.0/0.0/21_374/0026711, interní kód MU
Name: Inteligentní back office
Investor: Ministry of Industry and Trade of the CR
EG21_374/0026711, research and development project
Name: Inteligentní back office
MUNI/A/1195/2021, interní kód MU
Name: Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence
Investor: Masaryk University