👷 Readings in Digital Typography, Scientific Visualization, Information Retrieval and Machine Learning

[Michal Štefánik] Attention sparsification: Look into the future and the past (behind the context window) 8. 10. 2020

Abstract

You might have heard about how cool Transformers are, and so it's easy to forget their notorious flaws. One of the main ones is the quadratic size with respect to "attended" text window size. This disallows them to efficiently understand the more complex task, such as document classification, or document question answering.

In recent months, there's an uprising of research addressing the inefficient nature of attention mechanism, by some quite interesting tricks, of selecting the relevant subsections, propagating information through globally-visible words, or random traversal of the text in a graph-like manner, with a theory grounded in random graphs.

The results of those improvements already seem to help, but there's still a long way to go...

We'll take a look at the methods, from Sparse Attention, Longformer, to Big Bird. Feel free to take a look at the literature in advance, to inhale even more from the discussion.

Presentation slides in Google Docs (including animations)

Attention sparsification: Look into the future and the past (behind the context window)

Presentation slides (without animations) for the 2020-10-08 talk by Michal Štefánik

Readings and Literature

Based on your interest, pick one piece to read in advance (or more, if you like, ofc):

If you are not familiar with famous Transformers architecture, see the visuals and possibly the original paper:

Attention Is All You Need: Illustrated blog: http://jalammar.github.io/illustrated-transformer/
Attention Is All You Need: original paper: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Here are (pretty much all) the papers concerning Attention sparsification:

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context: https://arxiv.org/pdf/1901.02860.pdf
Generating Long Sequences with Sparse Transformers: https://arxiv.org/pdf/1904.10509.pdf
Longformer: The Long-Document Transformer: https://arxiv.org/pdf/2004.05150.pdf
Big Bird: Transformers for Longer Sequences: https://arxiv.org/pdf/2007.14062v1.pdf

3. Longformer provides a nice overview of the sparsification approaches.

2. Sparse Transformers shows how Attention can be used instead of convolution on Image processing tasks, using with 2D Attention.

4. Big Bird combines all previously introduces approaches. It shows experiments with applying Attention on Genomics data.

Předchozí

Následující

👷 Readings in Digital Typography, Scientific Visualization, Information Retrieval and Machine Learning
- Nyní studovat
  
  [Michal Štefánik] Attention sparsification: Look into the future and the past (behind the context window) 8. 10. 2020
- Nyní studovat
  
  [Vítek Novotný] Word Embeddings: Towards Fast, Interpretable, and Accurate Information Retrieval Systems 15. 10. 2020
- Nyní studovat
  
  [Mikuláš Bankovič] Single Image Super Resolution: SRCNN and ESPCN 22. 10. 2020
- Nyní studovat
  
  [Michal Štefánik] Attention semantics: What attention heads actually know and why should we care 29. 10. 2020
- Nyní studovat
  
  [Vlastimil Martinek] Experiments with image augmentation for classification and segmentation 5. 11. 2020
- Nyní studovat
  
  [Vítek Novotný & Dominik Rehák] Five Years of Markdown in LaTeX: What, Why, How, and Whereto 12. 11. 2020
- Nyní studovat
  
  [Jakub Ryšavý] Feature Reduction: Selection or Extraction for Time Series (Financial) Data 19. 11. 2020
- Nyní studovat
  
  [Vlastimil Martinek] Experiments with Image Augmentation for Classification and Segmentation: Part 2 26. 11. 2020
- Nyní studovat
  
  [Eniafe Festus Ayetiran] Exploting Semantic Knowledge for Aspect Sentiment Classification: A Deep Learning Approach 3. 12. 2020
- Nyní studovat
  
  [Michal Štefánik] Unsupervised Data Augmentation: Thinking Outside the Single-Objective Box 10. 12. 2020
- Nyní studovat
  
  [All] Christmas party 17. 12.
- Nyní studovat
  
  [Petr Sojka et al.]: The Representations of Language which Allow Thinking, Fast and Slow 7. 1. 2021
- Nyní studovat
  
  [Michal Štefánik and Vítek Novotný]: Poster session for the ALPS NLP winter school 14. 1. 2021
- Nyní studovat
  
  [Filip Široký] Forecasting the Linac3 ion beam current challenge (a.k.a. co se dělá v CERN) 28. 1. 2021
- Nyní studovat
  
  [Vítek Novotný a Michal Štefánik] Advanced Language Processing Winter School 2021 11. 2. 2021
- Nyní studovat
  
  [Vítek Novotný] Math-Aware Search Engine in a Single Line of Code 18. 2. 2021

Operace

Prohlédnout vše

Interaktivní osnova

[Michal Štefánik] Attention sparsification: Look into the future and the past (behind the context window) 8. 10. 2020

Abstract

Presentation slides in Google Docs (including animations)

Readings and Literature

Operace