Laboratory of Electronic and Multimedia Applications (Research Section)

[Martin Geletka]: Visual Document Understanding 16. 11. 2023

Abstract

We will present the individual task in the area of Visual Document Understanding. We will show this theoretical task on the practical need for Intelligent Back Office. We will describe interesting approaches that combine the information from Images and Text.

Presentation
Visual Document Understanding
Slides presented at the seminar on April 14, 2022
PDF ke stažení

2022-04-14-geletka.mp4
Záznam přednášky Martina Geletky 14. 4. 2022

Readings

  1. Xang, Y. (2020). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding (https://arxiv.org/abs/2012.14740)
  2. Kim, G. (2022). Donut: Document Understanding Transformer without OCR
    (https://arxiv.org/abs/2111.15664)
  3. Li, M (2021). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
    (https://arxiv.org/abs/2109.10282)