Big models Natural Language Modelling PA154 Jazykové modelování (13) Pavel Rychlý pary@fi.muni.cz May 25, 2021 bigger is better many layers need big machines using advanced hardware: GPU, TPU Natural Language Modelling BERT GPT Google pre-training on raw text masking tokens, is-next-sentence big pre-trained models available domain (task) adaptation Input: The :ran went to the [MASK]. . Ho bought a [MASK], of Labels: [MASK], = store; [MASK]2 = gallon Sentence A = The man went to the st Sentence B = He bought a gallon of Label = IsNextSentence Sentence A = The mai Sentence B = Penguii Label = NotNextsent Open Al GPT-2: 1.5 billion parameters GPT-3: 175 billion parameters very good text generation —► potentially harmful applications Misuse of Language Models bias - generate stereotyped or prejudiced content: gender, race, religion Sep 2020: Microsoft have "exclusive" use of GPT-3 Natural Language Modelling T5: Text-To-Text Transfer Transformer Pretrained models Google Al transfer learning C4: Colossal Clean Crawled Corpus translate English to Gern rola sentence: The se is jumping well. "stsb sentencel : The rhinc on the grass. sentence2: A rh: is grazing in a field." "summarize: state a thorities dispatched emergency ere ws tuesday to survey the damage after an onslaught of severe weather in m ississippi.." huge training data long training time small model fine tuning on target task multi-language models universal tokenization: subword units ► Byte-Pair Encoding (BPE) ► Word Piece ► SentencePiece PA154 Jazykové modelování (13) Natural Language Modelling I Language Modelling ALBERT Intrinsic evaluation A Lite BERT factorized embedding parameters cross-layer parameter sharing inter-sentence coherence loss Next Sentence Prediction Sentence-Order Prediction much smaller: No. parameters: 108M -> 12M (base) direct evaluation of word embeddings semantic similarity (WordSim-353, SimLex-999, ...) word analogy (Google Analogy, BATS (Bigger Analogy Test Set)) concept categorization (ESSLLI-2008) Sentence A = The man went to the store. Sentence B = fie bought a gallon of milk Label - IsKextSentence Sentence A = The mar. went to the sto Sentence B = Penguins are flightless Label = NotNextSenter.ee Language Modelling Language Modelling Extrinsic evaluation Multi-task benchmarks using the model in a downstream NLP task Part-of-Speech Tagging, Noun Phrase Chunking, Named Entity Recognition, Shallow Syntax Parsing, Semantic Role Labeling, Sentiment Analysis, Text Classification, Paraphrase Detection, Textual Entailment Detection GLUE (https://gluebenchjjiark.com) nine sentence- or sentence-pair language understanding tasks SuperGLUE (https: //super.gluebenchmark. com) more difficult language understanding tasks XTREME - Cross-Lingual Transfer Evaluation of Multilingual Encoders (https://sites.research.google/xtreme) 40 typologically diverse languages, 9 tasks Natural Language Modelling Language Modelling Libraries and Frameworks Dive into Deep Learning: online book https://d21.ai Hugging Face Transformers: many ready to use models https://huggingface.co/transformers jiant: library, many tasks for evaluation https://j iant.info GluonNLP: reproduction of latest research results https://nip.gluon.ai low level libraries: NumPy, PyTorch, TensorFlow, MXNet PA154 Jazykové modelování (13) Natural Language Modelling