GELETKA, Martin, Mikuláš BANKOVIČ, Dávid MELUŠ, Šárka ŠČAVNICKÁ, Michal ŠTEFÁNIK and Petr SOJKA. Information Extraction from Business Documents. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2022). Brno: Tribun EU, 2022, p. 35-46. ISBN 978-80-263-1752-4.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Information Extraction from Business Documents
Authors GELETKA, Martin (703 Slovakia, guarantor, belonging to the institution), Mikuláš BANKOVIČ (703 Slovakia, belonging to the institution), Dávid MELUŠ (703 Slovakia, belonging to the institution), Šárka ŠČAVNICKÁ (703 Slovakia, belonging to the institution), Michal ŠTEFÁNIK (703 Slovakia, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution).
Edition Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2022), p. 35-46, 12 pp. 2022.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW fulltext PDF
RIV identification code RIV/00216224:14330/22:00127213
Organization unit Faculty of Informatics
ISBN 978-80-263-1752-4
ISSN 2336-4289
Keywords (in Czech) OCR; multimodální učení; extrakce informací; transformery; strukturované dokumenty
Keywords in English OCR; Multi-modal learning; Information extraction; Transformers; Structured Documents
Tags International impact
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 09:51.
Abstract
Document AI is a relatively new research topic that refers to techniques for automatically reading, understanding, and analyzing business documents. Nowadays, many companies extract data from business documents through manual efforts that are time-consuming and expensive, requiring manual customization or configuration. This paper describes techniques to address these problems, apply them to real-world data, and implement them to an end-to-end solution for automatic information extraction from business documents.
Links
CZ.01.1.02/0.0/0.0/21_374/0026711, interní kód MUName: Inteligentní back office
Investor: Ministry of Industry and Trade of the CR
EG21_374/0026711, research and development projectName: Inteligentní back office
MUNI/A/1195/2021, interní kód MUName: Aplikovaný výzkum v oblastech vyhledávání, analýz a vizualizací rozsáhlých dat, zpracování přirozeného jazyka a aplikované umělé inteligence
Investor: Masaryk University
PrintDisplayed: 23/7/2024 02:33