MUNI FI From LM to Chat Large Language Models (LLM) PA154 Language Modeling (12.2) LM generates prompt continuation prompt engineering fine tune on chat texts some O&A/chat texts are in training data Pavel Rychlý pary@fi.muni.cz May 14,2024 ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 prompt engineering Chain-of-thought simple prompt: Q&A Q: {question} A: general knowledge: Generate some knowLedge about the concepts in the input. Input: {question} KnowLedge: task specific: If {premise} is true, is it aLso true that {hypothesis}? ||| {entaiLed}. direct asnwer is not correct for complex questions solve a problem in steps Q: {question} A: Let's think step by step. ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 ^avel Rychlý ■ Large Language Models (LLM) ■ May 14,2024 Generated trained data RLHF Reinforcement Learning from Human Feedback there are never enough text with the right O&As generate data from a pattern especially for chain-of-thought variation of variables (numbers in math,...) generated using LLM Ranking Data Converted from the human comparisons to so using the Elo algorithi Prompt Data training pairs of prompts -Supervised Model and responses Reward Model Aligned Model The learned RL policy after applying RLHF ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 5/9 ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 6/9 Alignment Foundation models annotated data by humans training using RLHF eliminate toxicity, bias ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 LoRA Low-Rank Adaptation of Large Language Models fine-tune only a small fraction of parameters usually only attention matrices ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024 LLM without fine tuning can be used to addapt on a new domain/language fine-tuned on a specific task ■ chat ■ question answering ■ summarization ^avel Rychly ■ Large Language Models (LLM) ■ May 14,2024