Introduction PA154 Jazykové modelování (1.1) Pavel Rychlý pary@fi.muni.cz March 2, 2021 PA154 - Technical Informations ■ Slides and recorded videos in IS https://is.muni.cz/auth/el/fi/jaro2021/PA154/index.qwarp ■ Final written exam (online) 50 points, 25 points for E ■ optional individual projects up to 25 points PA154 Jazykové modelování (1.1) Introduction 2/8 Individual projects presentation on a new research in language modeling small project as a part of bigger collaborative projects ► neural machine translation ► lexical acquisition PA154 Jazykové modelování (1.1) Introduction 3/8 Language models—what are they good for? assigning scores to sequencies of words predicting words generating text statistical machine translation automatic speech recognition optical character recognition PA154 Jazykové modelování (1.1) Introduction 4/8 Predicting words Do you speak ... Would you be so ... Statistical machine ... Faculty of Informatics, Masaryk WWII has ended in ... In the town where I was ... Lord of the ... PA154 Jazykové modelování (1.1) Introduction 5/8 Generating text Somewhat related to the image A person riding a motorcycle on a dirt road. Two dogs play in the grass. A group of young people playing a game of frisbee. Two hockey players are fighting over the puck. A herd of elephants walking across a dry grass field. A close up of a cat laying on a couch. A skateboarder does a trick on a ramp. mm A little girl in a pink hat is blowing bubbles. A red motorcycle parked on the side of the road. A dog is jumping to catch a frisbee. A refrigerator filled with lots of food and drinks. A yellow school bus parked in a parking lot. PA154 Jazykové modelování (1.1) Introduction 6/8 MT + OCR Introduction 7/8 Language models - probability of a sentence LM is a probability distribution over all possible word sequences. What is the probability of utterance of s? Pz_/w(Catalonia President urges protests) p/_w(President Catalonia urges protests) Pz_/w(urges Catalonia protests President) Ideally the probability should strongly correlate with fluency and intelligibility of a word sequence. PA154 Jazykové modelování (1.1) Introduction 8/8