PA153 Natural Language Processing 02 - Semantics I (lexical meaning and its representation) Karel Pala, Zuzana Nevěřilová NLP Centre, Fl MUNI, Brno Sep 25, 2019 Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 1/23 Q Lexical Meaning Q Meaning in Context Q Lexical Meanings in NLP Semantic Classes Q Conclusion, Take Home Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing Lexical Meaning (cs: lexikální význam): meaning of a word in isolation [Oxford Dictionaries, 2013] • regardless of meaning of the sentence the word is part of 9 regardless of grammar categories other types of meaning: grammatical meaning, word meaning, sentence meaning • buy - bought • image - picture 9 The old professor runs to catch the bus. The cheetah runs to catch the prey. 02 - Semantics I 3 PA153 Natural Language Processing Lexical Meaning Lexical Meaning Lexical Meaning (cs: lexikální význam): meaning of a word in isolation [Oxford Dictionaries, 2013] • regardless of meaning of the sentence the word is part of • regardless of grammar categories other types of meaning: grammatical meaning, word meaning, sentence meaning • buy - bought • image - picture • The old professor runs to catch the bus. The cheetah runs to catch the prey. buy and bought - the same lexical meaning, different grammatical meaning image and picture - different words with the same lexical (and grammatical) meaning to run - the same meaning, different activity Lexical Form and Lexical Unit Lexical Unit (cs: lexikální jednotka, LU) [Ziková, 2003]: o represented by a lexical form • asociated with a particular lexical meaning • has grammatical properties (e.g. transitive verb) o can have pragmatic properties (e.g. / each time references to some other person) o LU with the same meaning but different form pause synonyms (e.g. beautiful, lovely) o LU with the same form but different meaning pause homonyms (e.g. bark) 02 - Semantics I Where is information about lexical meaning Dictionary/lexicon/lexical database - storage of lexical units Dictionaries: • general, Language for General Purpose (LGP) also defining/explanatory (with definitions) • bilingual (contains translations of LUs) • etymological 9 encyclopedic • reverse • rhyming • single-field (contains domain terminology) • historical • ... for NLP, machine readable dictionaries are used □ Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 5/23 Anatomy of a dictionary entry "| determine-definition and synonyms 7 Q®OQrs _ 2 VERB Promroatio" | .'dťl3:{r}miri/ I Wtaid Farrre~| ^ 8 Contribute to our Open Dictionary [transitive] [often passive] to control what something will be Our prices are determined by the market. genetically/cjlturally/biologically determined: Sne claims thatmc£ human behaviour is socially determined. Wl Synonyms and related words To limit or control something or someone: draw a line in the sand, limit, control. Explore Thesaurus 1: headword 2: definition 3: grammatical category 4-5: pronunciation (sound & IPA) 6 7 8 inflection frequency numbered senses [ inTran s it i VErrRAN s it iVE] to officially decide something determine whethertwhy^who: it is for the court to determine whether she is guiity. El Synonyms and related words To make a decision: decide, determine, arrive at... Explore Thesaurus https://www.macmillandictionary.com/ learn/dictionary-entry.html Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 6/23 00 OJ (J) 1—I o OJ PA153 Natural Language Processing Anatomy of a dictionary entry Anatomy of a dictionary entry 1 determine *fiiii o©oo. **7 6 Cuplo-e Th?;auiu! 1: headword 2: definition 3: grammatical category 4-5: pronunciation (sound & IPA) 6: inflection 7: frequency 8: numbered senses Czech Remark on Czech Dictionaries: SSC SSJC: V SSJC není žádná odvozená forma, ale v SSC je. Pro jiná slova je v SSC mnohem více odvozených forem: květ, květen, květena, květák, květenství, květina, květináč, květinářka, květinářství Collocation as a Dictionary Entry A language user has available ... a large number of semi-preconstructed phrases that constitute single choices. (Sinclair 1991: 110) New York, ad hoc, foreign language, second language, to save time special collocation dictionaries: Oxford Collocations Dictionary, Macmillan Collocations Dictionary In NLP, the term multiword expresion (MWE) is used when the notion of MWE is useful: • phrasemes/idioms (imagine translating them) • the meaning of the components is different/unclear (MWEs as words with spaces) Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 7/23 Dictionary Definitions • extensional (denotative) definitions ► demonstrative definitions (by pointing) ► definition by enumeration (e.g. Baltic states are: Estonia, Latvia, and Lithuania) ► definition by subclass (e.g. "flower" means rose, lily, daisy, and the like) • intensional (connotative) definitions ► synonymous definition (e.g. "physician" means "doctor") ► etymological definition (e.g. the word "capital" comes from Latin word "caput" meaning "head") ► operational definition (e.g. "brain activity" happens iff an electroencephalograph shows oscillations) ► definition by genus and differentia Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 8/23 Definition by Genus and Differentia a triangle: A plane figure that has 3 straight bounding sides. a quadrilateral: A plane figure that has 4 straight bounding sides. hyperonymy Example: check a Wikipedia definition □ Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics I 9/23 PA153 Natural Language Processing Definition by Genus and Differentia Definition by Genus and Differentia a triangle: A plane figure that has 3 straight bounding sides. a quadrilateral: A plane figure that has 4 straight bounding sides. hyperonymy Example: check a Wikipedia definition dictionary entries assume at least some knowledge of the language (e.g. English GPL is around 2000 words) for NLP dictionaries for humans are not fully suitable Meaning in Context Lexical meaning is not always enough (in fact, it is not enough most of the time) =4> know the context Word Sense Disambiguation (cs: lexikální desambiguace) function: (1/1/, c) —> s • w G W - set of words • c G C - set of contexts • s G S - set of meanings Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 10/23 Word Sense Disambiguation All algorithms rely on a lexical database with discrete meanings Different dictionaries have different granularity of meanings. Meanings are not fully discrete. (interaction •valium \ laddict irelapse )abuse #acldiction •rehab^ / ►store \ locator \ idealer treatment ►therapy (cartel ►center \ trafficking ►testing Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics I 11 / 23 Word Sense Disambiguation is dead long live the ... Word Sense Discrimination riviŕrc irrigation match [Véronis, 2004] □ Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 12/23 Componential analysis (komponentová analýza) = description of meaning using a (small) set of semantic features that are either present, or not present, or irrelevant. • man = +HUMAN +ADULT +MALE • woman = +HUMAN +ADULT -MALE • boy = +HUMAN -ADULT +MALE • toddler = +HUMAN -ADULT ±MALE [Katz and Fodor, 1963] a [Bierwisch, 1971] Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 13/23 Componential analysis I Later related with semanti Substantives Relational Substantives Determiners Quantifiers Evaluators Descriptors Mental predicates Speech Actions, Events, Movement Existence, Possession Life and Death Time Space Logical Concepts Intensifier, Augmentor Similarity c primes and natural semantic metalanguage: I, YOU, SOMEONE, PEOPLE, SOMETHING/THING, BOD\ KIND, PART THIS, THE SAME, OTHER ELSE ANOTHER ONE, TWO, SOME, ALL, MUCH/MANY, LITTLE/FEW GOOD, BAD BIG, SMALL THINK, KNOW, WANT, DON'T WANT, FEEL, SEE, HEAR SAY, WORDS, TRUE DO, HAPPEN, MOVE BE (SOMEWHERE), THERE IS, BE (SOMEONE/SOMETHI LIVE, DIE WHEN/TIME, NOW, BEFORE, AFTER, A LONG TIME, I-FOR SOME TIME, MOMENT WHERE/PLACE, HERE, ABOVE, BELOW, FAR, NEAR, TOUCH (CONTACT) NOT, MAYBE, CAN, BECAUSE, IF VERY, MORE LIKE/AS/WAY Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 14/23 Semantic Classes = group words that share a semantic feature van - truck - motor vehicle - self-propelled vehicle - wheeled vehicle -vehicle - transport - instrumentation - artifact - whole - object - physical entity - entity taxonomy, hierarchy, tree structure Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 15/23 The Tree of Porphyry Supreme genus: Differentiae: Subordinate genera: Differentiae: Subordinate genera: Differentiae: Proximate genera: Differentiae: Species: Substance immaterial Body Spirit sensitive insensitive Animal Plant irrational Human Beast InttividitatM Socrates Plato Aristotle etc. Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 16/23 Semantic Networks, Inference Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 17/23 Semantic Networks WordNet (Princeton WordNet, PWN) - lexical network • originally a psychology project (G. A. Miller, od r. 1985) • usable by humans and computers (NLP) [Fellbaum, 1998] • basic unit: synonymical set (synset), cs: synonymická řada • relations between synsets: ► hyperonymy/hyponymy: truck - van ► holonymy/meronymy (part of, member of): car - brake ► troponymy: whisper - speak ► near-antonym: day - night ► derivation: wide - width • POS: nouns, adjectives, verbs, adverbs 9 each POS organized in a different way Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 18/23 Word Net size: PWN (117K synsets) follow-up projects: EuroWordNet (en, nl, it, es, de, fr, es, et) • ILI - InterLingual Index • Top Ontology (63 kategorif) o Base Concepts BalkaNet: bg, es, ro, gr, sr, tr Global WordNet Association (GWA) Czech W.: 28K synsets Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 19/23 Ontologies Lexical network - lexical knowledge Ontology - knowledge Ontology = explicit specification of shared conceptualization • domain ontologies • general o. SUMO/MILO (Suggested Upper Merged Ontology, Mid-Level Ontology) • common sense o. ConceptNet Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics I Conclusion, Take Home lexical meaning description: • human friendly: dictionaries NLP friendly: semantic primes, semantic networks, ontologies sense disambiguation human friendly: numbered senses in dictionaries • NLP friendly: measures, vectors ... Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 21/23 References Bierwisch, M. (1971). On classifying semantic features. In M. Bierwisch, K. E. H., editor, Progress in Linguistics, pages 27-50 Mouton. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication ). The MIT Press. Published: Hardcover. Katz, J. and Fodor, J. (1963). The structure of a semantic theory. Language, (39):170-210. Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 22/23 References Oxford Dictionaries (2013). lexical meaning. Oxford Dictionaries. online. http://oxforddictionaries.com/definition/english/ lexical-meaning (accessed October 03, 2013). Véronis, J. (2004). Hyperlex: Lexical cartography for information retrieval. In Computer Speech and Language: Special Issue on Word Sense Disambiguation, page 23. Ziková, M. (2003). Současný český jazyk: Tvoření slov. online. http://www.phi1.muni.cz/cest/lide/zikova/CJA009_l.rtf (accessed October 03, 2013). Karel Pala, Zuzana Nevěřilová PA153 Natural Language Processing 02 - Semantics 1 23/23