👷 Readings in Digital Typography, Scientific Visualization, Information Retrieval and Machine Learning

[Michal Štefánik] Unsupervised Data Augmentation: Thinking Outside the Single-Objective Box 10. 12. 2020



Join us using Zoom, on December 10 at 10 AM (CET).

An eagerness for a huge amount of training data is one of the major issues of SOTA estimators, disabling them to reach any useful level of quality on many tasks, where the data is scarce, or too expensive to obtain.

Currently, this problem is addressed in supervised settings by data augmentation strategies or mainly by auto-regressive objectives in unsupervised settings. Conventional data augmentation strategies, however, can only bring a  limited amount of noise to the original data to still keep the samples valid and hence can hardly substitute orders of magnitude of missing samples.

Unsupervised Data Augmentation (UDA) address this situation originally, by a funny, novel presumption: 

Having in-domain samples with no labels, yet all are belonging to a certain category, we can augment those samples and expect the system to predict the same output of an augmented sample, as the original one.

In this session, we'll describe the maths behind UDA, analyze the conditions and circumstances, when this semi-supervised approach can be used. We'll give a thought of implications that this work brings to research in data-scarce fields, and also to the industry, where data acquisition is a bottleneck of countless applications.

Unsupervised Data Augmentation: Thinking Outside the Single-Objective Box
video recording of the 2020-12-10 presentation by Michal Štefánik

Readings

  1. Unsupervised Data Augmentation for Consistency Training: https://arxiv.org/abs/1904.12848