Interaktivní osnova
[Michal Štefánik and Vítek Novotný]: Poster session for the ALPS NLP winter school 14. 1. 2021
Poster sessions at ALPS 2021 will take place in two two-hour sessions at the gather.town virtual platform.
At this final seminar of fall 2020, Vítek and Michal will wander around gather.town and present their ALPS 2021 posters.
Michal Štefánik: Abstract
Development of current state-of-the-art language models is heavily focused on Transformers architecture, that can be pre-trained on vast pre-training corpora and relatively quickly be fine-tuned to a specific task. On the pre-training stage, these models are usually trained on Masked Language Modeling, while on fine-tuning stage, models are trained to minimise cross-entropy on either token-level or sequence-level tasks.
While these fine-tuning objectives can be successfully followed, it is often for the price of the loss of generality of the network. Significantly, such generality loss is often misperceived during the training process, as both the loss and measured performance is tightly bound to the optimised objective. This in consequence causes the network to prone to what Kahneman in humans calls Availability of Heuristics - the system seeks for every shortcut heuristics, that would allow him for cutting down the loss any further.
Our experiments show this problem with neural translators, where model overfits specific length of parallel corpus pairs, that can not be observed on reported BLEU on own validation data set, or with summarization, where the model learns to write syntactically-coherent output but lacks to frame and propagate key points of the input. Again, being able to properly match the declension and other morphology also has beneficial, yet misleading impact on validation ROUGE.
This poster aims to familiarise the audience with this common flaw, analyse its reasons and to outline and promote the possible research directions, that could help ou language systems in a way towards higher levels of generality.
Vítek Novotný: Abstract
Since the seminal work of Mikolov and colleagues, word vectors of log-bilinear models have found their way into many NLP applications. Later, Mikolov and colleagues have equipped their log-bilinear model with positional weighting that allowed them to reach state-of-the-art performance on the word analogy task.
Although the positional model improves accuracy on the intrinsic word analogy task, prior work has neglected qualitative evaluation of its linguistic properties as well as quantitative evaluation on extrinsic end tasks.We open-source the positional model and we evaluate it using qualitative and quantitative tasks.
We show that the positional model captures information about parts of speech and self-information. We also show that the positional model consistently outperforms non-positional models on text classification and language modeling.