. ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 1 / 476 CZ.1.07/2.2.00/28.0041 Centrum interaktivních a multimediálních studijních opor pro inovaci výuky a efektivní učení IA161 Syntactic Formalisms for Parsing Natural Languages 2 / 476 Introducing Course objective Introducing theoretical backgrounds on parsing parsing methods focused on syntax practical implementation methods possible applications and evaluations IA161 Syntactic Formalisms for Parsing Natural Languages 3 / 476 Introducing Course syllabus PART I : Theoretical backgrounds Historical overview State of the art parsing methods and trends Advanced syntactic formalisms PART II : Practical applications Applications & Use Cases Practical Implementations Parsing Evaluation IA161 Syntactic Formalisms for Parsing Natural Languages 4 / 476 Introducing Course format Weekly lectures (2 hours) Final written exam Two homework assignments Grading Final exam: 60 points Each homework: 20 points For each homework 10 % top scoring individuals receive 5 bonus points Points required for colloquium: 60 points IA161 Syntactic Formalisms for Parsing Natural Languages 5 / 476 Lecture 1 . ...... Introductive and Historical Overview on Natural Languages Parsing IA161 Syntactic Formalisms for Parsing Natural Languages IA161 Syntactic Formalisms for Parsing Natural Languages 6 / 476 Lecture 1 Main points Introduction to Natural Language Processing Issues in Syntax What is a parsing? Overview of Parsing methods and trends IA161 Syntactic Formalisms for Parsing Natural Languages 7 / 476 Lecture 1 Why natural language processing ? Huge amounts of data from Internet and Intranet Applications for processing large amounts of texts need NLP expertise Classify text into categories Index and search large texts Automatic translation Speech recognition Information extraction Automatic summarization Question answering Knowledge acquisition Text generation/dialogues IA161 Syntactic Formalisms for Parsing Natural Languages 8 / 476 Lecture 1 History of Natural Language Processing 1948 – 1st NLP application? dictionary look-up system by Andrew Booth, for machine translation purposes developed at Birkbeck College, London University IA161 Syntactic Formalisms for Parsing Natural Languages 9 / 476 • So far, it turns out, they have not considered at all the problem of multiple meaning (!), and have been concerned only with the mechanics of looking up words in a dictionary. First, you sense the first letter of a word, and then have the machine see whether or not the memory contains precisely the word in question. If so, the machine simply produces the translation (…) of this word. If this exact word is not contained in the memory, then the machine discards the last letter of the word, and tries over. If this fails, it discards another letter, and tries again. After it has found the largest initial combination of letters which is in the dictionary, it “looks up” the whole discarded portion in a special “grammatical annex” of the dictionary. Thus confronted by “running,” it might find “run” and then find out what the ending (n)ing does to “run.” (Warren Weaver on Booth’s machine) • This first application shows how closely NLP stands to the origins of computer science. • Booth was formerly (during WWII) doing research on X-ray crystallography of explosives. This involved lots of arithmetics, hence after WWII he tried to develop electronic computers, the first was an Automatic Relay Calculator (ARC) – 1946. • In the same year he was funded by Rockefeller Foundation (RF) to visit US researchers, reported that only von Neumann gave him any time. He got in love with (and later married) von Neumann’s research assistant Kathleen and redesigned ARC according to von Neumann’s architecture. • In 1947 he visited RF’s Natural Sciences Division Director Warren Weaver, who refused to fund a computer for mathematical calculations, but suggested funding a computer for machine translation of natural languages (!). • Booth developed techniques for parsing text and also for building dictionaries. November 11, 1955 Booth gave an early public demonstration of natural language machine translation (in Figure). • Later on Booth was very successful in building computers, his wife Kathleen was programming them and wrote one of the first books on programming. • 1958 Kathleen did research on simulating neural networks to investigate ways in which animals recognise patterns, 1959 then a neural network for character recognition. Lecture 1 History of Natural Language Processing IA161 Syntactic Formalisms for Parsing Natural Languages 10 / 476 Lecture 1 History of Natural Language Processing 1949 – Warren Weaver Natural Sciences Division Director in the Rockefeller Foundation Mathematician, Science Advocate WWII code breaker He viewed Russian as English in code – the ”Translation” memorandum Also knowing nothing official about, but having guessed and inferred considerable about powerful new mechanized methods in cryptography – methods which I believe succeed even when one does not know what language has been coded – one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” IA161 Syntactic Formalisms for Parsing Natural Languages 11 / 476 • Weaver was one of the Machine Translation pioneers and one of the most important science managers at the time. All the time he was meeting scientists, putting them together, organizing funding, and investigating potential research areas; while being a top-scientist – in 1949 he co-authored the The Mathematical Theory of Communication with Claude Shannon. Lecture 1 History of Natural Language Processing 1966 – Over-promised under-delivered Machine Translation worked only word by word NLP brought the first hostility of research funding agencies NLP gave AI a bad name before AI had a name. All funding of NLP came to a grinding halt due to the infamous ALPAC report. Public spent 20 million with very limited outcomes. 1966–1976 – “A lost decade” Revival in 1980’s Martin Kay: The Proper Place of Men and Machines in Language Translation IA161 Syntactic Formalisms for Parsing Natural Languages 12 / 476 • ALPAC (Automatic Language Processing Advisory Committee) was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U. S. Government to reduce its funding of the topic dramatically. • ALPAC’s final recommendations were that research should be supported on: • 1. practical methods for evaluation of translations; • 2. means for speeding up the human translation process; • 3. evaluation of quality and cost of various sources of translations; • 4. investigation of the utilization of translations, to guard against production of translations that are never read; • 5. study of delays in the over-all translation process, and means for eliminating them, both in journals and in individual items; • 6. evaluation of the relative speed and cost of various sorts of machine-aided translation; • 7. adaptation of existing mechanized editing and production processes in translation; • 8. the over-all translation process; and • 9. production of adequate reference works for the translator, including the adaptation of glossaries that now exist primarily for automatic dictionary look-up in machine translation • Kay’s counterargument: “The goal of MT should not be the fully automatic high quality translation (FAHQT) that can replace human translators. Instead, MT should adopt less ambitious goals, e.g. more cost-effective human-machine interaction and aim at enhancement of human translation productivity.” Lecture 1 NLP looked to Linguistics Linguistics is language described, not prescribed. Linguistics had few applicable theories for Machine Translation 1957 – Noam Chomsky’s Syntactic Structures revolutionized Linguistics as it applies to Machine Translation. Rule based system of syntactic structures. Believed there are features common to all languages that enable people to speak creatively and freely. Hypothesized all children go through the same stages of language development regardless of the language they are learning – a concept of an innate Universal Grammar (never proven) One of the most prominent persons of NLP in 20th century, though very controversial. IA161 Syntactic Formalisms for Parsing Natural Languages 13 / 476 • Avram Noam Chomsky (born December 7, 1928) is an American linguist, philosopher, cognitive scientist, logician, and political commentator and activist. Working for most of his life at the Massachusetts Institute of Technology (MIT), where he is currently Professor Emeritus, he has authored over 100 books on various subjects. • He is credited as the creator or co-creator of the Chomsky hierarchy, the universal grammar theory, and the Chomsky–Schützenberger theorem. Chomsky is also well known as a political activist, and a leading critic of U.S. foreign policy, state capitalism, and the mainstream news media. Ideologically, he aligns himself with anarcho-syndicalism and libertarian socialism. • Highly influential, between 1980 and 1992, Chomsky was cited within the field of Arts and Humanities more often than any other living scholar, and eighth overall within the Arts and Humanities Citation Index during the same period. He has been described as a prominent cultural figure, and was voted the ”world’s top public intellectual” in a 2005 poll. Lecture 1 NLP looked to Linguistics 1958 – Bar-Hillel report Concluded Fully-Automatic High-Quality Translation (FAHQT) could not be accomplished without human knowledge. 1968 – Case Grammar (Fillmore) “The case for case” paper Later evolved into Frame Semantics 1970 – Augmented Transition Networks (Woods) Procedural Semantics – Theory of the “meaning” of sentence. Augmented Transition Network (ATN) parser IA161 Syntactic Formalisms for Parsing Natural Languages 14 / 476 • Yehoshua Bar-Hillel was charged by the US Office of Naval Research to make a critical assessment of current MT activity in the United States and Great Britain. Agencies of the US government were funding research on a large and increasing scale. • Bar-Hillel had been the first person appointed full-time to work on MT at the Massachusetts Institute of Technology in 1951. • His basic contention was that MT research – now a “multimillion dollar affair”, as he pointed out – was, with few exceptions, set on a mistaken and unattainable goal, namely, fully automatic translation of a quality equal to that of a good human translator. This he held to be utterly unrealistic, and in his view resources were being wasted which could be more fruitfully be devoted to the development of less ambitious and more practical computer aids for translators. • Woods: 2010 ACL Lifetime Achievement Award • all within the LUNAR project of NASA Apollo 11 Program • ATN parser was the first properly formalized hand-built parser that could produce Chomsky-style deep structure analyses for a large, fluent subset of natural English. Lecture 1 Successful NLP Systems 1970 – Shrdlu (Terry Winograd) Human: Pick up a big red block. Computer: OK. Human: Grasp the pyramid. Computer: I don’t understand which pyramid you mean. Human (changing their mind): Find a block which is taller than the one you are holding and put it into the box. Computer: By ”it”, I assume you mean the block which is taller than the one I am holding. Computer: OK. Human: What does the box contain? Computer: The blue pyramid and the blue block. Human: What is the pyramid supported by? Computer: The box. Human: How many blocks are not in the box? Computer: Four of them. Human: Is at least one of them narrower than the one which I told you to pick up? Computer: Yes, the red cube. IA161 Syntactic Formalisms for Parsing Natural Languages 15 / 476 • SHRDLU was an early natural language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In it, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified ”blocks world”, essentially a virtual box filled with different blocks. SHRDLU was written in the Micro Planner and Lisp programming language on the DEC PDP-6 computer and a DEC graphics terminal. Later additions were made at the computer graphics labs at the University of Utah, adding a full 3D rendering of SHRDLU’s ”world”. • The name SHRDLU was derived from ETAOIN SHRDLU, the arrangement of the alpha keys on a Linotype machine, arranged in descending order of usage frequency in English. Lecture 1 Successful NLP Systems II 1973 – Lunar question answering system (Woods) WHAT IS THE AVERAGE CONCENTRATION OF ALUMINUM IN HIGH ALKALI ROCKS? WHAT SAMPLES CONTAIN P200? GIVE ME THE MODAL ANALYSES OF P200 IN THOSE SAMPLES GIVE ME EU DETERMINATIONS IN SAMPLES WHICH CONTAIN ILM IA161 Syntactic Formalisms for Parsing Natural Languages 16 / 476 • LUNAR is an experimental natural language, information retrieval system. It was designed to help geologists access, compare, and evaluate chemical-analysis data on moon rock and soil composition obtained from the Apollo-11 mission. The primary goal of the designers was research on the problems involved in building a man-machine interface that would allow communicate in ordinary English, A ”real world” application was chosen for two reasons: First, it tends to focus effort on the problems really in need of solution (sometimes this is implicitly avoided in ”toy” problems) and second, the possibility of producing a system capable of performing a worthwhile task. • LUNAR system operates by translating a question entered in English into an expression in a formal query language (Codd, 1974). The translation is done with an augmented transition network (ATN) parser coupled with a rule-driven semantic interpretation procedure, which guides the analysis of the question. • The ”query” that results from this analysis is then applied to the database to produce the answer to the request,The query language is a generalization of the predicate calculus. Its central feature is a quantifier function that is able to express, in a simple manner, the restrictions placed on a database-retrieval request by the user. The function is used in concert with special enumeration functions for classes of database objects, freeing the quantifier function from explicit dependence on the structure of the database. LUNAR also served as a foundation for the early work on speech understanding at BBN. • The formal query language used by LUNAR system contains three types of objects: designators, which name classes of objects in the database (including functionally defined objects); propositions, which are formed from predicates with designators as arguments; and commands, which initiates actions. • Request: (DO MY SAMPLES HAVE GREATER THAN 13 PERCENT ALUMINIUM Query Language Translation (after parsing): (TEST (FOR SOME X1 / (SEQ SAMPLES) : T ; (CONTAIN X1 NPR* X2 / ’AL203) (GREATERTHAN 13 PCT)))) Response : YES • LUNAR processes these request in the following three steps: 1. Syntactic analysis using an augmented transition network parser and heuristic information (including semantics) to produce the most likely derivation tree for the request; 2. Semantic interpretation to produce a representation of the meaning of the request in a formal query language; 3. Execution of the query language expression on the database to produce the answer to the request. • LUNAR’s language processor contains an ATN grammar for a large subset of English, the semantic rules for interpreting database requests, and a dictionary of approximately 3,500 words. As an indication of the capabilities of the processor, it is able to deal with tense and modality, some anaphoric references and comparatives, restrictive relative clauses, certain adjective modifiers and embedded complement constructions. Lecture 1 Successful NLP Systems III 1976 – TAUM-METEO (University of Montreal) prototype MT system for translating weather forecasts between English and French 1985 – METEO (John Chandioux) successor of TAUM-METEO in operational use at Environnement Canada forecasts until 30th of September 2001 1970 – SYSTRAN provided translations for US Air Force’s Foreign Technology Division adopted by XEROX (1978) still developed, present in wide range of systems Google language tools Microsoft spell check IA161 Syntactic Formalisms for Parsing Natural Languages 17 / 476 • The METEO System is a Very High Quality Machine Translation system for weather bulletins that has been in operational use at Environnement Canada from 1982 to 2001. It stems from a prototype developed in 1975-76 by the TAUM Group, known as TAUM-METEO. As many authors confuse the prototype with the actual system, a bit of history is in order. • The initial motivation to develop that prototype was that a junior translator came to TAUM to ask for help in doing the extremely boring (and at the same time difficult) job of translating weather bulletins at Environment Canada he had to do at the moment. • Indeed, since all official communications emanating from the Canadian government must be available in French and English, because of the official bilingual services act of 1968, and weather bulletins represent a large amount of translation in real time, junior translators had to spend several months of purgatory producing first draft translations, then revised by seniors. That was in fact a quite difficult job, because of the specificities of the English and French sublanguages used, and not very motivating, as the lifetime of a bulletin is only 4 hours. Lecture 1 Major Issues in NLP Ambiguity in Language: Syntactic (structural) Semantic (word sense) Referential IA161 Syntactic Formalisms for Parsing Natural Languages 18 / 476 Lecture 1 Ambiguity Makes NLP difficult Structural/Syntactic ambiguity I saw the Grand Canyon flying to New York. I saw the sheep grazing in the field. Word Sense ambiguity The man went to the bank to get some cash. The man went to the bank and jumped in the river. Referential ambiguity Steve hated Paul. He hit him. He = Steve ? or he = Paul ? IA161 Syntactic Formalisms for Parsing Natural Languages 19 / 476 Lecture 1 Linguistics levels of analysis Speech Written language Phonetics Phonology Morphology Syntax Semantics Beyond: pragmatic, cognitive, logic… Each level has an input and output representation, output from one level is the input to the next, sometimes levels might be skipped (merged) or split. IA161 Syntactic Formalisms for Parsing Natural Languages 20 / 476 Lecture 1 Issues in syntax Propagation of errors from lower levels – mainly morphology, need to correctly identify the part of speech (POS) “The man did his homework” Who did what? man=noun; did=verb; his=genitive; homework=noun Identify collocations Mother in law, hot dog, … IA161 Syntactic Formalisms for Parsing Natural Languages 21 / 476 Lecture 1 More issues in Syntax Anaphora resolution “The son of my professor entered my class. He scared me.” Preposition attachment “I saw the man in the park with a telescope.“ IA161 Syntactic Formalisms for Parsing Natural Languages 22 / 476 Lecture 1 Syntax input and output Input: sequence of pairs (lemma, (morphological) tag) Output: sentence structure (tree) with annotated nodes (all lemmas, (morpho-syntactic tags, functions ) of various forms Deals with: The relation between lemmas & morphological categories and the sentence structure use syntactic categories such as subject, verb, object,… IA161 Syntactic Formalisms for Parsing Natural Languages 23 / 476 Lecture 1 Syntactic representation Tree structure Two main ideas for the tree Phrase structure (derivation tree) Using bracketed grouping Brackets annotated by phrase type Heads (often) explicitly marked Dependency structure Basic relation: head (governor) – dependent Links annotated by syntactic functions Phrase structure implicitly present IA161 Syntactic Formalisms for Parsing Natural Languages 24 / 476 Lecture 1 Dependency Tree vs. PS Tree IA161 Syntactic Formalisms for Parsing Natural Languages 25 / 476 Lecture 1 Shallow parsing “the man chased the bear” “the man” “chased the bear” Subject - - Predicate Identify basic structures NP-[the man] VP-[chased the bear] IA161 Syntactic Formalisms for Parsing Natural Languages 26 / 476 Lecture 1 Full parsing “John loves Mary“ S(Loves(John, Mary)) VP(∃x Loves(x, Mary)) Verb(∃y ∃x Loves(x, y)) loves NP(John)) Name(John) John NP(Mary) Name(Mary) Mary Help figuring out automatically questions like who did what and when? IA161 Syntactic Formalisms for Parsing Natural Languages 27 / 476 Lecture 1 What is a natural language parsing ? One of the most commonly researched tasks in Natural Language Processing (NLP) Parsing, in traditional sense, is what happens when a student takes the words of a sentences one by one, assigns each to a part of speech, specifies its grammatical categories, and lists the grammatical relations between words (identifying subject and various types of object for a verb, specifying the word with which some other word agrees, and so on). IA161 Syntactic Formalisms for Parsing Natural Languages 28 / 476 Lecture 1 Characteristics of parsing Much of the history of parsing until a few decades ago can be understood as the direct consequence of the history of theories of grammar: Parsing is done by human beings, rather than by physical machines or abstract machine What is parsed is a bit of natural language, rather than a bit of some language-like symbolic system Parsing is heuristic rather than algorithmic IA161 Syntactic Formalisms for Parsing Natural Languages 29 / 476 Lecture 1 New notions of parsing In the second half of 20th century the parsing has come to be extended to a large collection of operations in relation with theoretical linguistics, formal language theory, computer science, artificial intelligence and psycholinguistics: Parsing is the syntactic analysis of languages. The objective of Natural Language Parsing is to determine parts of sentences (such as verbs, noun phrases, or relative clauses), and the relationships between them (such as subject or object). Unlike parsing of formally defined artificial languages (such as Java or predicate logic), parsing of natural languages presents problems due to ambiguity, and the productive and creative use of language. IA161 Syntactic Formalisms for Parsing Natural Languages 30 / 476 Lecture 1 Parsing The grammar for Natural Language is ambiguous and typical sentences have multiple possible analyses (syntactically and semantically). Some parsing tools (i.e. grammatical, morphologic, syntactic, statistic, probabilistic, heuristic, …) help to find the most plausible parse tree of a given sentence. IA161 Syntactic Formalisms for Parsing Natural Languages 31 / 476 Lecture 1 Practical function of a parsing Parsing can tell us when a sentence is in a language defined by a grammar Parsing makes the extraction of the information possible by identifying relations between words, or phrases in sentences. IA161 Syntactic Formalisms for Parsing Natural Languages 32 / 476 Lecture 1 Practical function of a parsing Parsers are being used in a number of disciplines: In computer science Compiler construction, database interfaces, self-describing databases, artificial intelligence… In linguistics Text analysis, corpora analysis, machine translation… In document preparation and conversion In typesetting chemical formulae In chromosome recognition IA161 Syntactic Formalisms for Parsing Natural Languages 33 / 476 Lecture 1 Practical function of a parsing However, Many different possible syntactic formalisms: Regular expressions, Context-free grammars, Context-sensitive grammars, … Many different ways of representing the results of parsing: Parse tree, Chart, Graph, … IA161 Syntactic Formalisms for Parsing Natural Languages 34 / 476 Lecture 1 Historical overview of parsing methods Basically two ways to parse a sentence Top-down vs. Bottom-up We can characterize the search strategy of parsing algorithms in terms of the direction in which a structure is built: from the words upwards (bottom-up) or from the root node downwards (top-down) IA161 Syntactic Formalisms for Parsing Natural Languages 35 / 476 Lecture 1 Historical overview of parsing methods Directionality in these two ways Directional vs. Non-directional Non-directional top-down methods by S. Unger (1968) Non-directional bottom-up methods by CYK Directional top-down methods: The predict/match automaton, Depth-first search (backtrack), Breadth-first search (Greibach), Recursive descent, Definite Clause grammars Directional bottom-up methods: The shift/reduce automaton, Depth-first search (backtrack), Breadth-first search, restricted by Earley(1970) IA161 Syntactic Formalisms for Parsing Natural Languages 36 / 476 Lecture 1 Historical overview of parsing methods Methods originating at parsing of formal languages Linear directional top-down methods: LL(K) Linear directional bottom-up methods: Precedence, bounded-context, LR (k), LALR(1), SLR(1) Methods specifically devised for parsing of natural languages Generalized LR (Masaru Tomita) Chart parsing (Martin Kay) IA161 Syntactic Formalisms for Parsing Natural Languages 37 / 476 Lecture 1 Summary Natural language parsing as one of the NLP domain Extended notion of parsing in relation with different fields Ambiguity of language What is it to “parse”? Overview of basic parsing methods IA161 Syntactic Formalisms for Parsing Natural Languages 38 / 476 Lecture 1 References I H. Bunt, J. Carroll & G. Satta (eds.): New Developments in Parsing Technology, Kluwer, Dordrecht/Boston/London 2004 H. Bunt, P. Merlo, & J. Nivre (eds.): Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing, Springer Dordrecht, Heidelberg/London/New York 2010 H. Bunt, M. Tamita (eds.): Recent advances in parsing technology, Kluwer, Boston, 1996 G. Dick: Parsing techniques: a practical guide, Springer, 2008 Roger G. Johnson: Andrew D. Booth – Britain’s Other “Fourth Man”. In: History of Computing. Learning from the Past, Springer Berlin Heidelberg, 2010. J. Hutchins: From First Conception to First Demonstration: the Nascent Years of Machine Translation, 1947–1954. A Chronology. In: Machine Translation, Volume 12, Issue 3, Kluwer, 1997. IA161 Syntactic Formalisms for Parsing Natural Languages 39 / 476 Lecture 1 References II J. Hutchins: Milestones no.6: Bar-Hillel and the nonfeasibility of FAHQT. In: International Journal of Language and Documentation no.1, 1999. M. Kay: The proper place of men and machines in language translation. In: Machine Translation, Volume 12, Issue 1–2, Kluwer 1997 (reprint of 1980). More on history of MT: http://www.hutchinsweb.me.uk/history.htm IA161 Syntactic Formalisms for Parsing Natural Languages 40 / 476 Lecture 2 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 41 / 476 Lecture 2 . ...... Basic parsing methods IA161 Syntactic Formalisms for Parsing Natural Languages 42 / 476 Lecture 2 Main points Context-free grammar Parsing methods Top-down or bottom-up Directional or non-directional Basic parsing algorithms Unger CKY (or CYK) Left-corner parsing Earley IA161 Syntactic Formalisms for Parsing Natural Languages 43 / 476 Lecture 2 Ambiguity in Natural Language Notion of ambiguity Essential ambiguity: same syntactic structure but the semantics differ Spurious ambiguity: different syntactic structure but no change in semantics There is no unambiguous languages! An input may have exponentially many parses Should identify the “correct” parse IA161 Syntactic Formalisms for Parsing Natural Languages 44 / 476 Lecture 2 Ambiguity in Natural Language Main idea of parsing Parsing (syntactic structure) Input: sequence of tokens John ate an apple Output: parse tree S NP VP NAME VERB NP ART NOUN John ate an apple IA161 Syntactic Formalisms for Parsing Natural Languages 45 / 476 Lecture 2 Ambiguity in Natural Language Basic connection between a sentence and the grammar it derives from is the “parse tree”, which describes how the grammar was used to produce the sentences. For the reconstruction of this connection we need a “parsing techniques” IA161 Syntactic Formalisms for Parsing Natural Languages 46 / 476 Lecture 2 Ambiguity in Natural Language Word categories: Traditional parts of speech Noun Names of things boy, cat, truth Verb Action or state become, hit Pronoun Used for noun I, you, we Adverb Modifies V, Adj, Adv sadly, very Adjective Modifies noun happy, clever Conjunction Joins things and, but, while Preposition Relation of N to, from, into Interjection An outcry ouch, oh, alas, psst IA161 Syntactic Formalisms for Parsing Natural Languages 47 / 476 Lecture 2 Formal language Symbolic string set which describe infinitely unlimited language as mathematical tool for recognizing and generating languages. Topic of formal language: finding finitely infinite languages using rewriting system. Three basic components of formal language: finite symbol set, finite string set, finite formal rule set IA161 Syntactic Formalisms for Parsing Natural Languages 48 / 476 Lecture 2 Constituency Sentences have parts, some of which appear to have subparts. These groupings of words that go together we will call constituents. (How do we know they go together?) I hit the man with a cleaver I hit [the man with a cleaver] I hit [the man] with a cleaver You could not go to her party You [could not] go to her party You could [not go] to her party IA161 Syntactic Formalisms for Parsing Natural Languages 49 / 476 Lecture 2 The Chomsky hierarchy Type 0 Languages / Grammars (LRE: Recursively enumerable grammar) Rewrite rules α → β where α and β are any string of terminals and non-terminals Type 1 Context-sensitive Languages / Grammars (LCS) Rewrite rules αXβ → αϒβ where X is a non-terminal, and α, ϒ, β are any string of terminals and non-terminals, (ϒ must be non-empty but strings α and β can be empty). Type 2 Context-free Languages / Grammars (LCF) Rewrite rules X → ϒ where X is a non-terminal and ϒ is any string of terminals and non-terminals Type 3 Regular Languages / Grammars (LREG) Rewrite rules X → αY where X, Y are single non-terminals, and α is a string of terminals; Y might be missing. IA161 Syntactic Formalisms for Parsing Natural Languages 50 / 476 Lecture 2 The Chomsky hierarchy Type 0 > 1 > 2 > 3 according to generative power Superior language can generate inferior language but superior language is more inefficient and slow than inferior language. IA161 Syntactic Formalisms for Parsing Natural Languages 51 / 476 Lecture 2 The Chomsky hierarchy       Figure : Chomsky hierarchy IA161 Syntactic Formalisms for Parsing Natural Languages 52 / 476 Lecture 2 Context-free grammar (Type 2) The most common way of modeling constituency. The idea of basing a grammar on constituent structure dates back to Wilhem Wundt (1890), but not formalized until Chomsky (1956), and, independently, by Backus (1959). CFG = Context-Free Grammar = Phrase Structure Grammar= BNF = Backus-Naur Form IA161 Syntactic Formalisms for Parsing Natural Languages 53 / 476 Lecture 2 Context-free grammar (Type 2) CFG rewriting rule X →ϒ where X is a non-terminal symbol and ϒ is string consisting of terminals/non-terminals. The term “Context-free” expresses the fact that the non-terminal v can always be replaced by w, regardless of the context in which it occurs. IA161 Syntactic Formalisms for Parsing Natural Languages 54 / 476 Lecture 2 Context-free grammar (Type 2) G = < T, N, S, R> T is set of terminals (lexicon) N is set of non-terminals (written in capital letter). S is start symbol (one of the non-terminals) R is rules/productions of the form X →ϒ , where X is a non-terminal and ϒ is a sequence of terminals and non-terminals (may be empty). A grammar G generates a language L IA161 Syntactic Formalisms for Parsing Natural Languages 55 / 476 Lecture 2 Example1 of Context-Free Grammar G = < T, N, S, R> T = { that, this, a, the, man, book, flight, meal, include, read, does } N = { S, NP, NOM, VP, DET ,N, V, AUX } S = S R = { S → NP VP Det → that | this | a | the S → Aux NP VP N → book | flight | meal | man S → VP V → book | include | read NP → Det NOM AUX → does NP → N VP → V VP → V NP } IA161 Syntactic Formalisms for Parsing Natural Languages 56 / 476 Lecture 2 Example2 of Context-Free Grammar R1: S -> NP VP R13: DET -> his|her R2: NP -> DET N R14: DET -> the R3: NP -> NP PNP R15: V -> eat|serve R4: NP -> PN R16: V -> give R5: VP -> V R17: V -> speak|speaks R6: VP -> V NP R18: V -> discuss R7: VP -> V PNP R19: PN -> John|Mark R8: VP -> V NP PNP R20: PN -> Mary|Juliette R9: VP -> V PNP PNP R21: N -> daugther|mother R10: PNP -> PP NP R22: N -> son|boy R11: PP-> to|from|of R23: N -> salad|soup|meat R12: DET -> an|a R24: N -> desert|cheese|bread R25: ADJ -> small|kind Simplified example of CFG = GD IA161 Syntactic Formalisms for Parsing Natural Languages 57 / 476 Lecture 2 Example2 of Context-Free Grammar Using the presented grammar, we make a first derivation for the sentence “John speaks”, S -> GD NP VP (by R1) S -> GD PN VP (by R4) -> GD John VP (by R19) -> GD John V (by R5) -> GD John speaks (by R17) IA161 Syntactic Formalisms for Parsing Natural Languages 58 / 476 Lecture 2 Example2 of Context-Free Grammar Another derivation of “John speaks” from GD using rule 5 before rule 4 S -> GD NP VP S -> GD NP V -> GD NP speaks -> GD PN speaks -> GD John speaks IA161 Syntactic Formalisms for Parsing Natural Languages 59 / 476 Lecture 2 Production Rule 3 NP -> NP PNP Because it contains the same symbol in his left and his right, we say that the production having this property is recursive. IA161 Syntactic Formalisms for Parsing Natural Languages 60 / 476 Lecture 2 Production Rule 3 This property of R3 involves that the language generated by the grammar GD is infinite, because we can create the sentences of arbitrary length by iterative application of R3. Test NP -> GD NP PNP -> GD NP PNP PNP -> GD NP PNP PNP PNP…. The son of John speaks The son of the mother of John speaks The son of the daughter of the daughter ….of John speaks. IA161 Syntactic Formalisms for Parsing Natural Languages 61 / 476 Lecture 2 Production Rule 3 Last remark concerning this grammar (GD) This grammar can generate sentences which are ambiguous. “John speaks to the daughter of Mark” Example 1 A conversation between John and the daughter of Mark (R7) 2 A conversation about Mark between John and the daughter (R9) IA161 Syntactic Formalisms for Parsing Natural Languages 62 / 476 Lecture 2 Production Rule 3 VP VP Speaks PNP V PNP PNP Pto NP to the D of Mark NP the daughter PNP of Mark IA161 Syntactic Formalisms for Parsing Natural Languages 63 / 476 Lecture 2 Commonly used non-terminal abbreviations S sentence NP noun phrase PP prepositional phrase VP verb phrase XP X phrase N noun PREP preposition V verb DET/ART determiner / article ADJ adjective ADV adverb AUX auxiliary verb PN proper noun IA161 Syntactic Formalisms for Parsing Natural Languages 64 / 476 Lecture 2 Parsing methods Classification of parsing methods Top-down parsing vs. Bottom-up parsing Directional vs. non-directional parsing IA161 Syntactic Formalisms for Parsing Natural Languages 65 / 476 Lecture 2 Top-down or bottom-up Top-down parsing The sentence from the start symbol, the production tree is reconstructed from the top downwards Identify the production rules in prefix order Never explores a tree that cannot result in an S BUT Wastes time generating trees inconsistent with the input Bottom-up parsing The sentence back to the start symbol Identify the production rules in postfix order Never generates trees that are not grounded in the input BUT Wastes time generating trees that do not lead to an S IA161 Syntactic Formalisms for Parsing Natural Languages 66 / 476 Lecture 2 Top-down parsing Top-down parsing is goal-directed. A top-down parser starts with a list of constituents to be built. It rewrites the goals in the goal list by matching one against the LHS of the grammar rules, and expanding it with the RHS, ...attempting to match the sentence to be derived. If a goal can be rewritten in several ways, then there is a choice of which rule to apply (search problem) Can use depth-first or breadth-first search, and goal ordering. IA161 Syntactic Formalisms for Parsing Natural Languages 67 / 476 Lecture 2 Top-down parsing Simulation of the operation of parser in top-down methods The son speaks 1 S 2 NP VP 3 DET N VP 4 4. a N VP. Fail: input begin by the. We return to DET N VP 5 the N VP 6 the daughter VP. New fail α=le N VP …… 7 the son VP 8 the son V 9 the son speaks. …… IA161 Syntactic Formalisms for Parsing Natural Languages 68 / 476 Lecture 2 Top-down parsing Top-down parsing example S → NP VP → NAME VP → “John” VP → “John” VERV NP → “John” “ate” NP → “John” “ate” DET NOUN → “John” “ate” “an” NOUN → “John” “ate” “an” “apple” IA161 Syntactic Formalisms for Parsing Natural Languages 69 / 476 Lecture 2 Top-down parsing S NP VP (1) (2) (3) (4) (5) (7) (8) (6) NAME S NP VP NAME S NP VP John VERB NPNAME S NP VP John VERB NPNAME S NP VP John ate VERB NP ART NOUN NAME S NP VP John ate VERB NP ART NOUN NAME S NP VP John ate an VERB NP ART NOUN NAME S NP VP John ate an apple IA161 Syntactic Formalisms for Parsing Natural Languages 70 / 476 Lecture 2 Top-down parsing Algorithm of top-down left-right (LR) parsing α is a primal current word, u input to be recognized. tdlrp = main function tdlrp (α, u) begin if (α = u) then return (true) fi Α = u1……ukAΥ while (∃A− > β) do (β = uk+1……….uk+1 δ ) with δ = ϵ ou δ = A… if (tdlrp(u1……uk+1 δ Υ) = true) then return(true) fi od return (false) end IA161 Syntactic Formalisms for Parsing Natural Languages 71 / 476 Lecture 2 Top-down parsing Problems in top-down parsing Left recursive rules... e.g. NP → NP PP... lead to infinite recursion Will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with a V, and the sentence starts with a V. Top-down parsers do well if there is useful grammar-driven control: search is directed by the grammar. Top-down is hopeless for rewriting parts of speech (preterminals) with words (terminals). IA161 Syntactic Formalisms for Parsing Natural Languages 72 / 476 Lecture 2 Bottom-up parsing Bottom-up parsing is data-directed. The initial goal list of a bottom-up parser is the string to be parsed. If a sequence in the goal list matches the RHS of a rule, then this sequence may be replaced by the LHS of the rule. Parsing is finished when the goal list contains just the start symbol. If the RHS of several rules match the goal list, then there is a choice of which rule to apply (search problem) Can use depth-first or breadth-first search, and goal ordering. IA161 Syntactic Formalisms for Parsing Natural Languages 73 / 476 Lecture 2 Bottom-up parsing Let’s suppose that we have a sentence “the son eats his soup” in the grammar GD. Question How we can do to verify that the word belong to the language generated by the grammar GD and if the answer is positive to assign a tree? → The first idea can be given in the following algorithms: IA161 Syntactic Formalisms for Parsing Natural Languages 74 / 476 Lecture 2 Bottom-up parsing Bottom-up parsing example “John” “ate” “an” “apple” → NAME “ate” “an” “apple” → NAME VERV “an” “apple” → NAME VERV DET “apple” → NAME VERV DET NOUN → NP VERV DET NOUN → NP VERV NP → NP VP → S IA161 Syntactic Formalisms for Parsing Natural Languages 75 / 476 Lecture 2 Bottom-up parsing (1) (2) (3) (4) (5) (7) (8) (6) NP John ate an apple NAME John ate an apple NAME VERB John ate an apple NAME VERB ART John ate an apple NAME VERB ART NOUN John ate an apple NAME VERB ART NOUN NP VP John ate an apple NAME NP VERB ART NOUN NP VP John ate an apple NAME NP VERB ART NOUN NP VP S John ate an apple NAME NP VERB ART NOUN IA161 Syntactic Formalisms for Parsing Natural Languages 76 / 476 Lecture 2 Bottom-up parsing Problems with bottom-up parsing Unable to deal with empty categories: termination problem, unless rewriting empties as constituents is somehow restricted (but then it’s generally incomplete) Inefficient when there is great lexical ambiguity (grammar-driven control might help here). Conversely, it is data-directed: it attempts to parse the words that are there. Both Top-down (LL) and Bottom-up (LR) parsers can (and frequently do) do work exponential in the sentence length on NLP problems. IA161 Syntactic Formalisms for Parsing Natural Languages 77 / 476 Lecture 2 Left-corner parsing Left-corner parsing Bottom-up with top-down filtering: combine top-down processing with bottom-up processing in order to avoid going wrong in the ways that we are prone to go wrong with pure top-down and pure bottom-up techniques IA161 Syntactic Formalisms for Parsing Natural Languages 78 / 476 Lecture 2 Left-corner parsing . Going wrong with top-down parsing .. ...... S -> NP VP NP -> DET N NP -> PN VP -> IV DET -> the N -> robber PN -> Vincent IV -> died Vincent died. IA161 Syntactic Formalisms for Parsing Natural Languages 79 / 476 Lecture 2 Left-corner parsing . Going wrong with bottom-up parsing .. ...... S -> NP VP NP -> DET N VP -> IV VP -> TV NP TV -> plant IV -> died DET-> the N -> plant The plant died. 1 DET plant died 2 DET TV IV Fail 3 DET N IV OK 4 NP VP OK 5 S IA161 Syntactic Formalisms for Parsing Natural Languages 80 / 476 Lecture 2 Left-corner parsing . Combining Top-down and Bottom-up Information .. ...... S -> NP VP NP -> DET N NP -> PN VP -> IV DET -> the N -> robber PN -> Vincent IV -> died Vincent died. IA161 Syntactic Formalisms for Parsing Natural Languages 81 / 476 Lecture 2 Left-corner parsing Now, let’s look at how a left-corner recognizer would proceed to recognize Vincent died. 1 Input: Vincent died. Recognize an S. (Top-down prediction.) S vincent died 2 The category of the first word of the input is PN. (Bottom-up step using a lexical rule.) S PN vincent died IA161 Syntactic Formalisms for Parsing Natural Languages 82 / 476 Lecture 2 Left-corner parsing 3 Select a rule that has at its left corner : NP-> PN. (Bottom-up step using a phrase structure rule.) S NP PN vincent died 4 Select a rule that has at its left corner: S->NP VP. (Bottom-up step.) 5 Match! The left hand side of the rule matches with S, the category we are trying to recognize. S NP PN vincent died VP IA161 Syntactic Formalisms for Parsing Natural Languages 83 / 476 Lecture 2 Left-corner parsing 6 Input: died. Recognize a VP. (Top-down prediction.) 7 The category of the first word of the input is IV. (Bottom-up step.) S NP PN IV vincent died VP 8 Select a rule that has at its left corner: VP->IV. (Bottom-up step.) 9 Match! The left hand side of the rule matches with VP, the category we are trying to recognize. S NP PN IV vincent died VP IA161 Syntactic Formalisms for Parsing Natural Languages 84 / 476 Lecture 2 Left-corner parsing What is a left-corner of a rule: the first symbol on the right hand side. For example, NP is the left corner of the rule S → NP VP, and IV is the left corner of the rule VP → IV. Similarly, we can say that Vincent is the left corner of the lexical rule PN → Vincent. IA161 Syntactic Formalisms for Parsing Natural Languages 85 / 476 Lecture 2 Left-corner parsing What is a left-corner of a rule: “Predictive” parser : it uses grammatical knowledge to predict what should come next, given what it has found already. 4 operations creating new items from old: “Shift”, “Predict”, “Match” and “Reduce” IA161 Syntactic Formalisms for Parsing Natural Languages 86 / 476 Lecture 2 Left-corner parsing Definition (Corner relation) The relation ∠ between non-terminals A and B such that B ∠ A if and only if there is a rule A → Bα, where α denotes some sequence of grammar symbols Definition (Left corner relation) The transitive and reflexive closure of ∠ is denoted by ∠∗ , which is called left-corner relation IA161 Syntactic Formalisms for Parsing Natural Languages 87 / 476 Lecture 2 Left-corner parsing . Left-corner table .. ...... Non Terminal Left-corners S S NP time an VorN files NP NP time an VorN files VP VP VorN files VorP like PP PP VorP like VorN VorN files VorP VorP like Grammar S → NP VP S → S PP NP → time NP → an arrow NP → VorN VP → VorN VP → VorP NP PP → VorP NP VorN → files VorP → like IA161 Syntactic Formalisms for Parsing Natural Languages 88 / 476 Lecture 2 How to deal with ambiguity? Backtracking Try all variants subsequently. Determinism Just choose one variant and keep it (i.,e. greedy). Parallelism Try all variants in parallel. Underspecification Do not desambiguate, keep ambiguity. IA161 Syntactic Formalisms for Parsing Natural Languages 89 / 476 Lecture 2 Summary One view on parsing: parsing as a phrase-structure formal grammar recognition task Parsing approaches: top-down, bottom-up, left-corner IA161 Syntactic Formalisms for Parsing Natural Languages 90 / 476 Lecture 3 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 91 / 476 Lecture 3 . ...... Chart parsing IA161 Syntactic Formalisms for Parsing Natural Languages 92 / 476 Lecture 3 Main points CKY algorithm Earley parsing General chart parsing methods IA161 Syntactic Formalisms for Parsing Natural Languages 93 / 476 Lecture 3 Directional or non-directional { Directional top-down Directional bottom-up    Non-directional top-down method – firstly by Unger Non-directional bottom-up – by Cocke, Younger and Kasami (CYK, also CKY) → They access the input in an seemingly arbitrary order, so they require the entire input to be in memory before parsing can start IA161 Syntactic Formalisms for Parsing Natural Languages 94 / 476 Lecture 3 Non-directional top-down methods by Unger Capable of working with the entire class of CFG Expects as input a sentence and a CFG It works by searching for partitionings of the input which match the right hand side(RHS) of production rules. IA161 Syntactic Formalisms for Parsing Natural Languages 95 / 476 Lecture 3 Non-directional top-down methods by Unger Let G denote a CF grammar and w be an input sentence. Principle: if the input sentence w belongs to the language L(G) it must be derivable from the start symbol S of the grammar G. Let S be defined as: S→S1 S2…Sk The input sentence w must be obtainable from the sequence of symbols S1 S2…Sk in a way that S1 must derive a first part of the input, S2 a second part, and so on. S1 S2 Sk W1…wp1 wp1+1…wp2….. wpk−1…wpk IA161 Syntactic Formalisms for Parsing Natural Languages 96 / 476 Lecture 3 Non-directional bottom-up methods as CYK CYK is an example of chart parsing discovered independently by Cocke, Younger and kasami Consider which non-terminals can be used to derive substrings of the input, beginning with shorter strings and moving up to longer strings 1 Start with strings of length one, matching the single character in the input strings against unit productions in the grammar 2 Then considers all substrings of length two, looking for production with right-hand side elements that match the two characters of the substring. 3 Continues up to longer strings IA161 Syntactic Formalisms for Parsing Natural Languages 97 / 476 Lecture 3 Non-directional bottom-up methods as CYK CYK example 2 Two example sentences and their potential analysis He [gave[the young cat][to Bill]]. He [gave [the young cat][some milk]]. The corresponding grammar rules: VP -> Vditrans NP PPto VP -> Vditrans NP VP Regardless of the final sentence analysis, the ditransitive verb (gave) and its first object NP (the young cat) will have the same analysis -> No need to analyze it twice. IA161 Syntactic Formalisms for Parsing Natural Languages 98 / 476 Lecture 3 Non-directional bottom-up methods as CYK Solutions: chart parsing 1 Store analyzed constituents: well formed substring table or (passive) chart 2 Partial and complete analyses: (active) chart In other words, instead of recalculating that the young cat is an NP, we will store that information Dynamic programming: never go backwards IA161 Syntactic Formalisms for Parsing Natural Languages 99 / 476 Lecture 3 CKY algorithm program CKY Parser; begin for p := 1 to n do V[p, 1] := {A|A → ap ∈ P }; for q := 2 to n do for p := 1 to n − q + 1 do V[p, q] = ∅; for k :=1 to q − 1 do V[p, q] = V[p, q] ∪ ∪ {A|A → BC ∈ P, B ∈ V[p, k], C ∈ V[p + k, q − k]}; od od od end Complexity of CKY is O(n3) IA161 Syntactic Formalisms for Parsing Natural Languages 100 / 476 počítá se (trojúhelníková) matice V: • sloupce = pozice ve vstupní větě • řádky = délky (pod)řetězců vstupní věty • prvky = množiny neterminálů, které pokrývají odpovídající část vstupní věty první cyklus naplní první řádek matice ve vnitřním cyklu se B a C vybírají vždy z už hotových políček matice (menší řetězce) – tj. od 2. řádku už vůbec nepracujeme se vstupní větou, jen s předchozími řádky neznámé terminály na vstupu se ignorují Lecture 3 CKY example input grammar: . Definition .. ...... S → AA|BB|AX|BY|a|b X → SA Y → SB A → a B → b input string w = abaaba. IA161 Syntactic Formalisms for Parsing Natural Languages 101 / 476 Lecture 3 CKY example – solution a b a a b a a . Definition .. ...... S → AA|BB|AX|BY|a|b X → SA Y → SB A → a B → b p – position, q – length q p 1 2 3 4 5 6 1 S, A S, B S, A S, A S, B S, A 2 Y X S, X Y X 3 S ∅ Y S 4 X S ∅ 5 ∅ X 6 S IA161 Syntactic Formalisms for Parsing Natural Languages 102 / 476 nechat počítat na tabuli studenty – políčka v prvních řádcích jdou rychleji napsat na tabuli prázdnou matici V a do ní doplňovat. postup: např. 2. řádek, políčko [1,2] vzniká z [1,1] a [2,1] – 4 kombinace SS, SB, AS, AB → v gramatice je jen SB, tj. Y. kombinace se vždycky počítají ze dvou políček, které se pohybují ve “véčku” nad počítaným polem. ∅ v políčku znamená, že příslušný podřetězec nejde vygenerovat z žádného pravidla gramatiky. složitost CKY je vždycky O(n3 ) na rozdíl od ostatních, kde je jen Ω(n3 ) výsledek = true/false podle toho, jestli je v políčku dole kořen. pro generování stromů z CKY tabulky bychom si museli pamatovat v každém políčku, z jakých políček vznikl který neterminál. Lecture 3 CKY online demo http://www.diotavelli.net/people/void/demos/cky.html IA161 Syntactic Formalisms for Parsing Natural Languages 103 / 476 Lecture 3 DCG DCG= Definite Clause Grammars Syntactic shorthand for producing parsers with Prolog clauses: Prolog-based parsing Represent the input with difference lists: two lists with the first containing the input to parse (a suffix of the entire input string) and the second containing the string remaining after a successful parse. These two lists correspond to the input and output variables of the clauses. Each clause corresponds to a non-terminal in the grammar. IA161 Syntactic Formalisms for Parsing Natural Languages 104 / 476 Lecture 3 Earley parser Jay Earley, 1968 Strong resemblance to LR parsing but more dynamic Work with what are called Earley items Earley item is a production augmented with a marker inserted at some point in the production’s right hand side and a number to indicate where in the input matching of the production began. Earley item sets are constructed by applying three operations to the current list of Earley item sets: scanner, predictor, completor IA161 Syntactic Formalisms for Parsing Natural Languages 105 / 476 Lecture 3 Earley algorithm Repeat until no new item can be added: 1 Prediction For every state in agenda of the form (X → α • Y β, j), add (Y → • γ, k) to agenda for every production in the grammar with Y on the left-hand side (Y → γ). 2 Scanning If a is the next symbol in the input stream, for every state in agenda of the form (X → α • a β, j), add (X → α a • β, j) to agenda. 3 Completion For every state in agenda of the form (X → γ •, j), find states in agenda of the form (Y → α • X β, i) and add (Y → α X • β, i) to agenda. IA161 Syntactic Formalisms for Parsing Natural Languages 106 / 476 Lecture 3 Earley algorithm Earley’s example A pointed rule (Marker) is a production increased by a point. The point indicates the current state of application of the rule The girl speaks S->•GN GV S->GN•GV GN-> • GN GNP GN->GN•GNP 1 2 3 4 DET->the. N->girl. V->speaks. IA161 Syntactic Formalisms for Parsing Natural Languages 107 / 476 Lecture 3 Earley algorithm 4 S->NP•VP V -> speaks• 3 S->NP•VP, NP->NP•NPP N -> girl• 2 DET->the•, NP->DET•N 1 2 3 The girl speaks IA161 Syntactic Formalisms for Parsing Natural Languages 108 / 476 Lecture 3 Chart parsing The Earley parser can be modified to work bottom-up or head-corner ⇒ a variety of chart parsing algorithms (Kay, 1980) IA161 Syntactic Formalisms for Parsing Natural Languages 109 / 476 Lecture 3 Chart parsing Three basic approaches: top-down bottom-up head-driven No constraints on the CF grammar Chart parsers usually contain two data structures chart and agenda, both of contain which contain edges. Edge is a triple [A→ α•β, i, j], where: i, j ∈ N, 0 ≤ i ≤ j ≤ n for n input words A → αβ is a grammar rule 0 a 1 b 2 a 3 a 4 b 5 a 6 [A → BC • DE, 0, 3] IA161 Syntactic Formalisms for Parsing Natural Languages 110 / 476 Lecture 3 General chart parser program Chart Parser; begin initialize (CHART); initialize (AGENDA); while (AGENDA not empty) do E := take edge from AGENDA; for each (edge F, which can be created by the edge E and another edge from CHART) do if ((F is not in AGENDA) and (F is not in CHART) and (F is different from E) then add F to AGENDA; fi; od; add E to CHART; od; end; IA161 Syntactic Formalisms for Parsing Natural Languages 111 / 476 tato struktura programu je společná všem typům chart parserů. ty se navzájem liší v: 1. jak se inicializuje 2. jak se vybírá F dá se to udělat i jinak (bez agendy), ale tato metoda je nejčastější proč se nezacyklí: 1. hran je konečný počet 2. každou hranu projde maximálně jednou Lecture 3 Top-down approach Initialization: ∀ p ∈ P | p = S → α add edge [S→ •α, 0, 0] to agenda. startup chart is empty. Iteration – take edge E from agenda and then: a) (fundamental rule) if E is in the form of [A→ α•, j, k], then for each edge [B→ γ• A β, i, j] in the chart, create an edge [B→ γ A •β, i, k]. b) (closed edges) if E is in the form of [B→ γ• A β, i, j], then for each edge [A→ α•, j, k] in the chart, create an edge [B → γ A •β, i, k]. c) (read terminal) if E is in the form of [A→ α•aj+1β, i, j], create an edge [A → α aj+1•β, i, j+1]. d) (prediction) if E is in the form of [A→ α• B β, i, j] then for each grammar rule B→ γ ∈ P, create an edge [B→ • γ, i, i]. IA161 Syntactic Formalisms for Parsing Natural Languages 112 / 476 Lecture 3 Example – chart parsing Grammar: S → CLAUSE CLAUSE → V OPTPREP N OPTPREP → ϵ OPTPREP → PREP V → jel PREP → kolem N → domu N → kolem Sentence: ”jel kolem domu” (a1=jel, a2=kolem, a3=domu). IA161 Syntactic Formalisms for Parsing Natural Languages 113 / 476 Lecture 3 Example – chart after top-down analysis jel kolem domu 00 11 22 33 NN→ domu.PREP→ kolem.VV→ jel. NN→ kolem. SS→ .CLAUSE CLAUSE→ V OPTPREP . N OPTPREP→ PREP.. CLAUSE→ V . OPTPREP N CLAUSE→ . V OPTPREP N OPTPREP→ .. OPTPREP→ .PREP CLAUSE→ V OPTPREP . N CLAUSE→ V OPTPREP N . SS→ CLAUSE . SS→ CLAUSE . CLAUSE→ V OPTPREP N . IA161 Syntactic Formalisms for Parsing Natural Languages 114 / 476 1. inicializace 2. predikce – aplikace d) 3. predikce – aplikace d) 4. terminal – aplikace c) 5. uzavrena hrana – aplikace a) 6. … 7. v posledním – vynechané ϵ-hrany (nevešly by se) složitost – počet pravidel bereme jako konstantu → pak máme podle délky vstupu n celkem n2 možných hran a v každém kroku zpracujem až n hran → O(n3 ). Lecture 3 Bottom-up approach Initialization: ∀ p ∈ P | p = A→ ϵ add edges [A→ •, 0, 0], [A→ •, 1, 1], ..., [A→ •, n, n] to agenda. ∀ p ∈ P | p = A→ aiα add edge [A→ •aiα, i-1, i-1] to agenda. startup chart is empty. Iteration – take an edge E from agenda and then: a) (fundamental rule) if E is in the form of [A→ α•, j, k], then for each edge [B→ γ• A β, i, j] in the chart, create an edge [B→ γ A •β, i, k]. b) (closed edges) if E is in the form of [B→ γ• A β, i, j], then for each edge [A→ α•, j, k] in the chart, create an edge [B → γ A •β, i, k]. c) (read terminal) if E is in the form of [A→ α•aj+1β, i, j], then create an edge [A → α aj+1•β, i, j+1]. d) (prediction) if E is in the form of [A→ α•, i, j], then for each grammar rule B→Aγ create an edge [B→ •Aγ, i, i]. IA161 Syntactic Formalisms for Parsing Natural Languages 115 / 476 a), b) a c) jsou stejné jako u shora dolů, liší se jen v d). většinou vytváří víc nadbytečných hran. Lecture 3 Head-driven chart parsing Rule head – any particular right-hand side non-terminal E.g. in the rule CLAUSE → V OPTPREP N heads can be V, OPTPREP, N. An edge is a triple [A→ α•β•γ, i, j], where i, j ∈ N, 0 ≤ i ≤ j ≤ n for n input words, A→ αβγ is a grammar rule and the head is in β. The algorithm (bottom-up approach) is very similar to the previous simpler one. The analysis does not go left to right, but begins on the head of each rule instead. IA161 Syntactic Formalisms for Parsing Natural Languages 116 / 476 Lecture 3 Head-driven chart parsing Initialization ∀ p ∈ P | p = A→ ϵ add edges [A→ ••, 0, 0], [A→ ••, 1, 1], ..., [A→ ••, n, n] to agenda. ∀ p ∈ P | p = A→ αaiβ (ai is rule head) add edge [A→ α•ai•β, i-1, i] to agenda. startup chart is empty. IA161 Syntactic Formalisms for Parsing Natural Languages 117 / 476 Lecture 3 Head-driven chart parsing Iteration – take and edge E from agenda and then: a1) if E is in the form of [A→ •α•, j, k], then for each edge [B→ β•γ•Aδ, i, j] in the chart, create edge [B→ β•γA•δ, i, k]. a2) [B→ βA•γ•δ, k, l] in the chart, create edge [B→ β•Aγ•δ, j, l]. b1) if E is in the form of [B→ β•γ•Aδ, i, j], then for each edge [A→ •α•, j, k] in the chart, create edge [B→ β•γA•δ, i, k]. b2) if E is in the form of [B→ βA•γ•δ, k, l], then [A→ •α•, j, k] in the chart, create edge [B→ β•Aγ•δ, j, l]. c1) if E is in the form of [A→ βai•γ•δ, i, j], then create edge [A→ β•aiγ•δ, i-1, j]. c2) if E is in the form of [A→ β•γ•aj+1δ, i, j], then create edge [A→ β•γaj+1•δ, i, j+1]. d) if E is in the form of [A→ •α•, i, j], then for each grammar rule B→ β A γ create edge [B→ β•A•γ, i, j] (A is rule head). IA161 Syntactic Formalisms for Parsing Natural Languages 118 / 476 Lecture 3 Generalized LR method by Tomita Tomita’s Algorithm extends the standard LR parsing algorithm: LR parsing is very efficient, but can only handle a small subset of CFG can handle arbitrary CFG LR efficiency is preserved In order to keep a record of the parse-state, we maintain a stack consisting of symbol/state pairs. IA161 Syntactic Formalisms for Parsing Natural Languages 119 / 476 Lecture 3 Generalized LR method by Tomita generalized LR parser (GLR) Masaru Tomita: Efficient parsing for natural language, 1986 uses a standard LR table which may contain conflicts stack is represented as a DAG reduction performed before reading action IA161 Syntactic Formalisms for Parsing Natural Languages 120 / 476 Lecture 3 Tree ranking all chart parsing methods: parallelization as means of fighting the ambiguity key concept: a polynomial data structure holding up to exponential parse trees efficient algorithms to retrieve n-best trees according to some ranking enable taking into account a probabilistic notion of a sentence IA161 Syntactic Formalisms for Parsing Natural Languages 121 / 476 Lecture 3 PCFG = Probabilistic CFG each rule r ∈ R has a probability P(r) assigned probability of a tree t ∈ T usually computed as P(t) = Πr∈tP(r) ⇒ tbest = argmaxt(P(t)) IA161 Syntactic Formalisms for Parsing Natural Languages 122 / 476 Lecture 3 Statistical parsing CFG → PCFG → learned grammar → statistical parsing → how to obtain probabilities (= how to train the parser?) IA161 Syntactic Formalisms for Parsing Natural Languages 123 / 476 Lecture 3 Statistical NLP In the 90’s: a change of paradigm in (computational) linguistics from rationalism to empiricism (corpus-based evidence) Simultaneously in NLP: big development of language modelling and statistical methods based on machine learning (both supervised and unsupervised). → statistical parsing vs. Chomsky: It must be recognised that the notion of a ‘probability of a sentence’ is an entirely useless one, under any interpretation of this term (Chomsky, 1969) [taken from Chapter 1 of Young and Bloothooft, eds, Corpus-Based Methods in Language and Speech Processing] IA161 Syntactic Formalisms for Parsing Natural Languages 124 / 476 Lecture 3 Summary (Probabilistic) Context-free grammar used in parsing natural language Chart parsing methods: CKY, Earley, head-driven chart parsing IA161 Syntactic Formalisms for Parsing Natural Languages 125 / 476 Lecture 3 References H. Bunt, M. Tamita: Recent advances in parsing technology, Kluwer, 1996 H. Bunt, P. Merlo, & J. Nivre (eds.): Trends in Parsing Technology: Dependency Parsing, Domain Adaptation, and Deep Parsing, Springer Dordrecht, Heidelberg/London/New York 2010 G. Dick: Parsing techniques: a practical guide, Springer, 2008 J. Earley: An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, 1970 M. Kay: Algorithm schemata and data structures in syntactic processing. In Readings in natural language processing, pages 35–70. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986 M.-J. Nederhof: Generalized left-corner parsing. In Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics, pages 305–314, Morristown, NJ, USA, 1993. Association for Computational Linguistics. IA161 Syntactic Formalisms for Parsing Natural Languages 126 / 476 Lecture 4 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 127 / 476 Lecture 4 . ...... Dependency Syntax and Parsing IA161 Syntactic Formalisms for Parsing Natural Languages 128 / 476 Lecture 4 Outline 1 Motivation 2 Dependency Syntax 3 Dependency Parsing IA161 Syntactic Formalisms for Parsing Natural Languages 129 / 476 Lecture 4 Motivation what you have seen as far: applying analysis of formal languages to a natural language – creating a phrase-structure derivation tree according to some grammar PS accounts for one important syntactic property: constituency is that all? but what about: discontinuous phrases, structure sharing IA161 Syntactic Formalisms for Parsing Natural Languages 130 / 476 Lecture 4 Motivation another crucial syntactic phenomenon is dependency what is a dependency? ”some relation between two words“ what is the difference to phrase-structure? what does constituency express? what does dependency express? IA161 Syntactic Formalisms for Parsing Natural Languages 131 / 476 Lecture 4 Dependency Syntax (Meľchuk 1988) A more formal account – what is a dependency? A relation! . Dependency Relation .. ...... Let W be a set of all words within a sentence, then dependency relation → is D ⊆ W × W such that: D is anti-reflexive: a → b ⇒ a ̸= b D is anti-symmetric: a → b ∧ b → a ⇒ a = b, ≡ (anti-reflexivity) a → b ⇒ b ↛ a D is anti-transitive: a → b ∧ b → c ⇒ a ↛ c optionally: D is labeled: there is a mapping l : D → L, L being the set of labels IA161 Syntactic Formalisms for Parsing Natural Languages 132 / 476 Lecture 4 Dependency Representation a → b: a depends on b, a is a dependent b, b is the head of a a dependency graph a dependency tree IA161 Syntactic Formalisms for Parsing Natural Languages 133 / 476 Lecture 4 Dependency Tree vs. PS Tree sleep S ideas furiously NP VP Green A N V ADV Green ideas sleep furiously IA161 Syntactic Formalisms for Parsing Natural Languages 134 / 476 Lecture 4 Non-projectivity a property of a dependency tree: a sentence is non-projective whenever drawing (projecting) a line from a node to the surface of the tree crosses an arc a lot of attention has been paid to this problem practical implications are rather limited (in most cases non-projectivity can be easily handled or avoided) hard cases: koupil Malou chaloupku IA161 Syntactic Formalisms for Parsing Natural Languages 135 / 476 Lecture 4 Czech Tradition of Dependency Syntax a long tradition of dependency syntax in the Prague linguistic school (Sgall, Hajičová, Panevová) Institute of Formal and Applied Linguistics at Charles University formalized as Functional Generative Description (FGD) of language Prague Dependency Treebank (PDT) IA161 Syntactic Formalisms for Parsing Natural Languages 136 / 476 Lecture 4 Dependencies vs. PS is one of the formalisms clearly better than the other one? No. dependencies: ⊕ account for relational phenomena, ⊕ simple phrase-structure: ⊕ account for constituency, ⊕ easy chunking can we perform transformation from one of the formalism to the other one a vice versa? Technically yes, but . . . It is not a problem to convert the structure between a dependency tree and a PS tree ... ... but it is a problem to transform the information included ⇒ both of the formalisms are convertible but not mutually equivalent IA161 Syntactic Formalisms for Parsing Natural Languages 137 / 476 Lecture 4 Dependency Parsing rule-based vs. statistical transition-based (→ deterministic parsing) graph-based (→ spanning trees algorithms) various other approaches (ILP, PS conversion, . . . ) very recent advances (vs. long studied PS parsing algorithms) IA161 Syntactic Formalisms for Parsing Natural Languages 138 / 476 Lecture 4 Introduction to Dependency parsing Motivation a. dependency-based syntactic representation seem to be useful in many applications of language technology: machine translation, information extraction → transparent encoding of predicate-argument structure b. dependency grammar is better suited than phrase structure grammar for language with free or flexible word order → analysis of diverse languages within a common framework c. leading to the development of accurate syntactic parsers for a number of languages → combination with machine learning from syntactically annotated corpora (e.g. treebank) IA161 Syntactic Formalisms for Parsing Natural Languages 139 / 476 Lecture 4 Introduction to Dependency parsing Dependency parsing “Task of automatically analyzing the dependency structure of a given input sentence” Dependency parser “Task of producing a labeled dependency structure of the kind depicted in the follow figure, where the words of the sentence are connected by typed dependency relations” ROOT Economic news had little effect on financial markets . PRED PU PC ATTATT OBJ ATTSBJATT IA161 Syntactic Formalisms for Parsing Natural Languages 140 / 476 Lecture 4 Definitions of dependency graphs and dependency parsing Dependency graphs: syntactic structures over sentences Def. 1.: A sentence is a sequence of tokens denoted by S = w0w1 . . . wn Def. 2.: Let R = {r1, . . . , rm} be a finite set of possible dependency relation types that can hold between any two words in a sentence. A relation type r ∈ R is additionally called an arc label. IA161 Syntactic Formalisms for Parsing Natural Languages 141 / 476 Lecture 4 Definitions of dependency graphs and dependency parsing Dependency graphs: syntactic structures over sentences Def. 3.: A dependency graph G = (V, A) is a labeled directed graph, consists of nodes, V, and arcs, A, such that for sentence S = w0w1 . . . wn and label set R the following holds: 1 V ⊆ {w0w1 . . . wn} 2 A ⊆ V × R × V 3 if (wi, r, wj) ∈ A then (wi, r′ , wj) /∈ A for all r′ ̸= r IA161 Syntactic Formalisms for Parsing Natural Languages 142 / 476 Lecture 4 Approach to dependency parsing a. data-driven it makes essential use of machine learning from linguistic data in order to parse new sentences b. grammar-based it relies on a formal grammar, defining a formal language, so that it makes sense to ask whether a given input is in the language defined by the grammar or not. → Data-driven have attracted the most attention in recent years. IA161 Syntactic Formalisms for Parsing Natural Languages 143 / 476 Lecture 4 Data-driven approach . ...... according to the type of parsing model adopted, the algorithms used to learn the model from data the algorithms used to parse new sentences with the model a. transition-based start by defining a transition system, or state machine, for mapping a sentence to its dependency graph. b. graph-based start by defining a space of candidate dependency graphs for a sentence. IA161 Syntactic Formalisms for Parsing Natural Languages 144 / 476 Lecture 4 Data-driven approach a. transition-based learning problem: induce a model for predicting the next state transition, given the transition history parsing problem: construct the optimal transition sequence for the input sentence, given induced model b. graph-based learning problem: induce a model for assigning scores to the candidate dependency graphs for a sentence parsing problem: find the highest-scoring dependency graph for the input sentence, given induced model IA161 Syntactic Formalisms for Parsing Natural Languages 145 / 476 Lecture 4 Transition-based Parsing Transition system consists of a set C of parser configurations and of a set D of transitions between configurations. Main idea: a sequence of valid transitions, starting in the initial configuration for a given sentence and ending in one of several terminal configurations, defines a valid dependency tree for the input sentence. D1′m = d1(c1), . . . , dm(cm) IA161 Syntactic Formalisms for Parsing Natural Languages 146 / 476 Lecture 4 Transition-based Parsing Definition Score of D1′m factors by configuration-transition pairs (ci, di): s(D1′m) = ∑m i=1 s(ci, di) Learning Scoring function s(ci, di) for di(ci) ∈ D1′m Inference Search for highest scoring sequence D∗ 1′m given s(ci, di) IA161 Syntactic Formalisms for Parsing Natural Languages 147 / 476 Lecture 4 Transition-based Parsing Inference for transition-based parsing Common inference strategies: Deterministic [Yamada and Matsumoto 2003, Nivre et al. 2004] Beam search [Johansson and Nugues 2006, Titov and Henderson 2007] Complexity given by upper bound on transition sequence length Transition system Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003] Limited non-projective O(n) [Attardi 2006, Nivre 2007] Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009] IA161 Syntactic Formalisms for Parsing Natural Languages 148 / 476 Lecture 4 Transition-based Parsing – Nivre algorithm IA161 Syntactic Formalisms for Parsing Natural Languages 149 / 476 Lecture 4 Transition-based Parsing Learning for transition-based parsing Typical scoring function: s(ci, di) = w · f(ci, di) where f(ci, di) is a feature vector over configuration ci and transition di and w is a weight vector [wi = weight of featurefi(ci, di)] Transition system Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003] Limited non-projective O(n) [Attardi 2006, Nivre 2007] Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009] Problem Learning is local but features are based on the global history IA161 Syntactic Formalisms for Parsing Natural Languages 150 / 476 Lecture 4 Transition-based Parsing Projectivization to pseudo-projectivity: IA161 Syntactic Formalisms for Parsing Natural Languages 151 / 476 Lecture 4 Graph-based Parsing For a input sentence S we define a graph Gs = (Vs, As) where Vs = {w0, w1, . . . , wn} and As = {(wi, wj, l)|wi, wj ∈ V and l ∈ L} Score of a dependency tree T factors by subgraphs Gs, . . . , Gs: s(T) = ∑m i−1 s(Gi) Learning: Scoring function s(Gi) for a subgraph Gi ∈ T Inference: Search for maximum spanning tree scoring sequence T∗ of Gs given s(Gi) IA161 Syntactic Formalisms for Parsing Natural Languages 152 / 476 Lecture 4 Graph-based Parsing Learning graph-based models Typical scoring function: s(Gi) = w · f(Gi) where f(Gi) is a high-dimensional feature vector over subgraphs and w is a weight vector [wj = weight of feature fj(Gi)] Structured learning [McDonald et al. 2005a, Smith and Johnson 2007]: Learn weights that maximize the score of the correct dependency tree for every sentence in the training set Problem Learning is global (trees) but features are local (subgraphs) IA161 Syntactic Formalisms for Parsing Natural Languages 153 / 476 Lecture 4 Graph-based Parsing – Eisner algorithm IA161 Syntactic Formalisms for Parsing Natural Languages 154 / 476 Lecture 4 Graph-based Parsing – Chu-Liu-Edmonds algorithm IA161 Syntactic Formalisms for Parsing Natural Languages 155 / 476 Lecture 4 Grammar-based approach a. context-free dependency parsing exploits a mapping from dependency structures to CFG structure representations and reuses parsing algorithms originally developed for CFG → chart parsing algorithms b. constraint-based dependency parsing parsing viewed as a constraint satisfaction problem grammar defined as a set of constraints on well-formed dependency graphs finding a dependency graph for a sentence that satisfies all the constraints of the grammar (having the best score) IA161 Syntactic Formalisms for Parsing Natural Languages 156 / 476 Lecture 4 Grammar-based approach a. context-free dependency parsing Advantage: Well-studied parsing algorithms such as CKY, Earley’s algorithm can be used for dependency parsing as well. → need to convert dependency grammars into efficiently parsable context-free grammars; (e.g. bilexical CFG, Eisner and Smith, 2005) b. constraint-based dependency parsing defines the problem as constraint satisfaction Weighted constraint dependency grammar (WCDG, Foth and Menzel, 2005) Transformation-based CDG IA161 Syntactic Formalisms for Parsing Natural Languages 157 / 476 Lecture 4 Conclusions 1 Dependency syntax vs. constituency (phrase-structure) syntax 2 Non-projectivity 3 Graph-based and Transition-based methods IA161 Syntactic Formalisms for Parsing Natural Languages 158 / 476 Lecture 5 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 159 / 476 Lecture 5 . ...... Parsing with (L)TAG and LFG IA161 Syntactic Formalisms for Parsing Natural Languages 160 / 476 Lecture 5 (Lexicalized) Tree Adjoining Grammar (TAG) and Lexical Functional Grammar (LFG) A) Same goal formal system to model human speech model the syntactic properties of natural language syntactic frame work which aims to provide a computationally precise and psychologically realistic representation of language B) Properties Unfication based Constraint-based Lexicalized grammar IA161 Syntactic Formalisms for Parsing Natural Languages 161 / 476 Lecture 5 How to parse the sentence in TAG? by Joshi, A. Levy, L and Takahashi, M. in 1975 IA161 Syntactic Formalisms for Parsing Natural Languages 162 / 476 Lecture 5 TAG’s basic component Representation structure: phrase-structure trees Finite set of elementary trees Two kinds of elementary trees Initial trees (α): trees that can be substituted Auxiliary trees (β): trees that can be adjoined IA161 Syntactic Formalisms for Parsing Natural Languages 163 / 476 Lecture 5 TAG’s basic component The tree in (X ∪ Z) are called elementary trees. Initial tree: Auxiliary tree: terminal nodes or substitution nodes Z Z* X IA161 Syntactic Formalisms for Parsing Natural Languages 164 / 476 Lecture 5 TAG’s basic component An initial tree (α) all interior nodes are labeled with non-terminal symbols the nodes on the frontier of initial tree are either labeled with terminal symbols, or with non-terminal symbols marked for substitution (↓) An auxiliary tree (β) one of its frontier nodes must be marked as foot node (∗) the foot node must be labeled with a non-terminal symbol which is identical to the label of the root node. A derived tree (γ) tree built by composition of two other trees the two composition operations that TAG uses adjoining and substitution. IA161 Syntactic Formalisms for Parsing Natural Languages 165 / 476 Lecture 5 Main operations of combination (1): adjunction Sentence of the language of a TAG are derived from the composition of an α and any number of β by this operation. It allows to insert a complete structure into an interior node of another complete structure. Three constraints possible Null adjunction (NA) Obligatory adjunction (OA) Selectional adjunction (SA) IA161 Syntactic Formalisms for Parsing Natural Languages 166 / 476 Lecture 5 Main operations of combination (1): adjunction Y S NP0↓ NP1↓ NP1↓ NP0↓ VP VP VP VP V VV VP*V has has lovedloved S X X X* X Y (α) (α2 ) Adjoining (β1 ) + → (β) (γ) IA161 Syntactic Formalisms for Parsing Natural Languages 167 / 476 Lecture 5 Main operations of combination (2): substitution It inserts an initial tree or a lexical tree into an elementary tree. One constraint possible Selectional substitution S NP0↓ NP1↓ N NP0↓ VP NP VP VP V D↓D↓ NV loved womanwomanloved S X A↓ A (α2 ) Substitution (α3 ) + → IA161 Syntactic Formalisms for Parsing Natural Languages 168 / 476 Lecture 5 Adjoining constraints Selective Adjunction (SA(T)): only members of a set T ⊆ A can be adjoined on the given node, but the adjunction is not mandatory Null Adjunction (NA): any adjunction is disallowed for the given node (NA = SA(ϕ)) Obligatory Adjunction (OA(T)): an auxiliary tree member of the set T ⊆ A must be adjoined on the given node for short OA = OA(A) IA161 Syntactic Formalisms for Parsing Natural Languages 169 / 476 Lecture 5 Example 1: selective adjunction (SA) One possible analysis of “send” could involve selective adjunction: α1 β1 β2 S VP VP NP↓ VPSA(β1,β2,...) VP* away VP* PP send NP↓ P NP↓ to send send away send to send something IA161 Syntactic Formalisms for Parsing Natural Languages 170 / 476 Lecture 5 Example 2: obligatory adjunction For when you absolutely must have adjunction at a node: α1 β1 β2 S VP VP NP↓ VPOA(β1,β2) Aux VP* Aux VP* V has is seen has is has seen is seen IA161 Syntactic Formalisms for Parsing Natural Languages 171 / 476 Lecture 5 Elementary trees (initial trees and auxiliary trees) Yesterday a man saw Mary S NP Adv S* D D↓ N (βyest) (αa) (αman) yesterday a man S NP0 ↓ VP NP V NP1 ↓ N saw Mary *: foot node/root node ↓: substitution node IA161 Syntactic Formalisms for Parsing Natural Languages 172 / 476 Lecture 5 Elementary trees (initial trees and auxiliary trees) S Ad S yesterday NP VP D N V NP a man saw N (α5) Mary IA161 Syntactic Formalisms for Parsing Natural Languages 173 / 476 Lecture 5 Derivation tree Specifies how a derived tree was constructed The root node is labeled by an S-type initial tree. Other nodes are labeled by auxiliary trees in the case of adjoining or initial trees in the case of substitution. A tree address of the parent tree is associated with each node. saw man(1) Mary(2.2) yest (0) a (1) IA161 Syntactic Formalisms for Parsing Natural Languages 174 / 476 Lecture 5 Derivation tree and derived tree α5 saw man(1) Mary(2.2) yest (0) a (1) S Ad S yesterday NP VP D N V NP a man saw N (α5) Mary _ _ _ _ : substitution operation ______ : adjunction operation IA161 Syntactic Formalisms for Parsing Natural Languages 175 / 476 Lecture 5 Example 1: Harry likes peanuts passionately Step 1 NP Harry NP peanuts S NP VP V NP likes VP VP* ADV passionatelyStep 2: substitution NP Harry S NP VP V NP likes NP peanuts + + S NP VP V NP likes Harry peanuts Step 3: adjunction S NP VP V NP likes Harry peanuts VP VP* ADV passionately + S NP VP V NP likes Harry peanuts VP ADV passionately IA161 Syntactic Formalisms for Parsing Natural Languages 176 / 476 Lecture 5 Derivation tree and derived tree of Harry likes peanuts passionately likes Harry(1) peanuts(2.2) passionately(2) S NP VP V NP likes Harry peanuts VP ADV passionately IA161 Syntactic Formalisms for Parsing Natural Languages 177 / 476 Lecture 5 Two important properties of TAG Elementary trees can be of arbitrary size, so the domain of locality is increased Extended domain of locality (EDL) Small initial trees can have multiple adjunctions inserted within them, so what are normally considered non-local phenomena are treated locally Factoring recursion from the domain of dependency (FRD) IA161 Syntactic Formalisms for Parsing Natural Languages 178 / 476 Lecture 5 Extended domain of locality (EDL): Agreement The lexical entry for a verb like “loves” will contain a tree like the following: S NP3.sg↓ VP V NP↓ loves With EDL, we can easily state agreement between the subject and the verb in a lexical entry IA161 Syntactic Formalisms for Parsing Natural Languages 179 / 476 Lecture 5 Factoring recursion from the domain of dependency (FRD): Extraction S’ NPi[+wh] S’ who COMP S that NP VP Bill V NP likes ei S’ COMP S Φ INFL NP VP did John V NP S’* tell Sam . ...... The above trees for the sentence “who did John tell Sam that Bill likes ?” allow the insertion of the auxiliary tree in between the WH-phrase and its extraction site, resulting a long distance dependency; yet this is factored out from the domain of locality in TAG. IA161 Syntactic Formalisms for Parsing Natural Languages 180 / 476 Lecture 5 Factoring recursion from the domain of dependency (FRD): Extraction S’ NPi[+wh] S’ who COMP S Φ INFL NP VP did John V NP tell Sam S’ COMP S that NP VP Bill V NP likes ei IA161 Syntactic Formalisms for Parsing Natural Languages 181 / 476 Lecture 5 Variations of TAG Feature Structure Based TAG (FTAG: Joshi and Shanker, 1988) each of the nodes of an elementary tree is associated with two feature structures: top & bottom Substitution Substitution with features Adjoining with features Y X Xtr br t U tr br X Y t b Y tr br tf bf X Y t U tr br tf b U bf t Y Y* Y Y IA161 Syntactic Formalisms for Parsing Natural Languages 182 / 476 Lecture 5 Variations of TAG Synchronous TAG (STAG: Shieber and Schabes, 1990) A pair of TAGs characterize correspondences between languages Semantic interpretation, language generation and translation Muti-component TAG (MCTAG: Chen-Main and Joshi, 2007) A set of auxiliary tree can be adjoined to a given elementary tree Probabilistic TAG (PTAG: Resnik, 1992, Shieber, 2007) Associating a probability with each elementary tree Compute the probability of a derivation IA161 Syntactic Formalisms for Parsing Natural Languages 183 / 476 Lecture 5 XTAG Project (UPenn, since 1987 ongoing) A long-term project to develop a wide-coverage grammar for English using the Lexicalized Tree-Adjoining Grammar (LTAG) formalism Provides a grammar engineering platform consisting of a parser, a grammar development interface, and a morphological analyzer The project extends to variants of the formalism, and languages other than English IA161 Syntactic Formalisms for Parsing Natural Languages 184 / 476 Lecture 5 XTAG system Input Sentence P.O.S Blender Tree Selection Derivation Structure Parser Morph Analyzer Tagger Tree Grafting Morph DB Stat DB Trees DB Syn DB Lex Prob DB IA161 Syntactic Formalisms for Parsing Natural Languages 185 / 476 Lecture 5 Components in XTAG system Morphological Analyzer & Morph DB: 317K inflected items derived from over 90K stems POS Tagger & Lex Prob DB: Wall Street Journal-trained 3-gram tagger with N-best POS sequences Syntactic DB: over 30K entries, each consisting of: Uninflected form of the word POS List of trees or tree-families associated with the word List of feature equations Tree DB: 1004 trees, divided into 53 tree families and 221 individual trees IA161 Syntactic Formalisms for Parsing Natural Languages 186 / 476 Lecture 5 (a) Morphology database (b) syntactic database Interfaces to the database maintenance tools IA161 Syntactic Formalisms for Parsing Natural Languages 187 / 476 Lecture 5 Interface to the XTAG system Parser evaluation in XTAG Project by [Bangalore,S. et.al, 1998] http://www.cis.upenn.edu/~xtag/ IA161 Syntactic Formalisms for Parsing Natural Languages 188 / 476 Lecture 5 How to parse the sentence in LFG? by Bresnan, J. and Kaplan, R.M. In 1982 IA161 Syntactic Formalisms for Parsing Natural Languages 189 / 476 Lecture 5 Main representation structures c-structure: constituent structure level where the surface syntactic form, including categorical information, word order and phrasal grouping of constituents, is encoded. f-structure: functional structure internal structure of language where grammatical relations are represented. It is largely invariable across languages. (e.g. SUBJ, OBJ, OBL, (X)COMP, (X)ADJ) a-structure: argument structure They encode the number, type and semantic roles of the arguments of a predicate. IA161 Syntactic Formalisms for Parsing Natural Languages 190 / 476 Lecture 5 Level of structures and their interaction in LFG Functional Projection architecture semantic structure information structure phonological structure argument structure functional structure constituent structure LFG's focus IA161 Syntactic Formalisms for Parsing Natural Languages 191 / 476 Lecture 5 Level of structures and their interaction in LFG In LFG, the parsing result is grammatically correct only if it satisfies 2 criteria: 1 the grammar must be able to assign a correct c-structure 2 the grammar must be able to assign a correct well-formed f-structure IA161 Syntactic Formalisms for Parsing Natural Languages 192 / 476 Lecture 5 c-structure C-structure PP P NP with N friends S NP VP N V NP I saw Det N the girl The constituent structure represents the organization of overt phrasal syntax It provides the basis for phonological interpretation Languages are very different on the c-structure level :external factors that usually vary by language . Properties of c-structure .. ...... c-structures are conventional phrase structure trees: they are defined in terms of syntactic categories, terminal nodes, dominance and precedence. They are determined by a context free grammar that describes all possible surface strings of the language. LFG does not reserve constituent structure positions for affixes: all leaves are individual words. IA161 Syntactic Formalisms for Parsing Natural Languages 193 / 476 Lecture 5 f-structure PRED OBJ PRED NUM PLURAL'friend' 'with' PRED 'friend' NUM PLURAL PRED 'with' OBJ Attribute-Value notation for f-structure . ...... 1 representation of the functional structure of a sentence 2 f-structure match with c-structure 3 it has to satisfy three formal constraints: consistency, coherence, completeness 4 language are similar on this level: allow to explain cross-linguistic properties of phenomena IA161 Syntactic Formalisms for Parsing Natural Languages 194 / 476 Lecture 5 Examples of f-structure OBJ TENSE PRED SUBJ OBJ2 PRED PAST SUBJ, OBJ, OBJ2 PRED PRED DEF NUM SG SUBJ TENSE PRED PRED DEF NUM SG PAST PCASE OBJ PRED DEF NUM SG 'homework' + OBLon + 'teacher' - 'e-mail' 'Sabine' 'Veit' OBLon SUBJ, OBJ ''insist OBLon 'send ' 1 2 IA161 Syntactic Formalisms for Parsing Natural Languages 195 / 476 Lecture 5 Constraint 1: f-structure must be consistent 1 Two paths in the graph structure may designate the same element-called unification, structure-sharing Ex: John must leave PRED XCOMP PRED SUBJ PRED 'leave' 'must' 'John' SUBJ PRED 'leave' SUBJ PRED 'must' SUBJ XCOMP PRED 'John' IA161 Syntactic Formalisms for Parsing Natural Languages 196 / 476 Lecture 5 Constraint 1: f-structure must be consistent 2 attributes are functionally unique - there may not be two arcs with the same attribute from the same f-structure OBJOBJ PRED 'Veit' PRED 'Tom' SUBJ SUBJ PRED TENSE TENSE SUBJ ''sleep PAST FUT Incosnistent f-structure * IA161 Syntactic Formalisms for Parsing Natural Languages 197 / 476 Lecture 5 Constraint 1: f-structure must be consistent 3 The symbols used for atomic f-structure are distinct - it is impossible to have two names for a single atomic f-structure (“clash”) PRED SUBJ PRED NUM 'pro' 'sleep' *They sleeps excludedSINGULAR /PLURAL IA161 Syntactic Formalisms for Parsing Natural Languages 198 / 476 Lecture 5 Constraint 2: f-structure must be coherent All argument functions in an f-structure must be selected by the local PRED feature. SUBJ PRED TENSE PRED NUM SG PERS 3 PRES OBJ PRED NUM PERS 3 SG 'Mary' 'John' 'fall 'SUBJ SUBJ PRED TENSE PRED NUM SG PERS 3 PRES 'John' 'fall 'SUBJ ? Complete f-structure Incoherent f-structure IA161 Syntactic Formalisms for Parsing Natural Languages 199 / 476 Lecture 5 Constraint 3: f-structure must be complete All functions specified in the value of a PRED feature must be present in the f-structure of that PRED. OBJ PRED NUM PERS 3 SG 'Mary' SUBJ PRED TENSE PRED NUM SG PERS 3 PRES 'John' 'like 'SUBJ OBJ Complete f-structure Incoherent f-structure ? SUBJ PRED TENSE PRED NUM SG PERS 3 PRES 'John' 'like 'SUBJ OBJ IA161 Syntactic Formalisms for Parsing Natural Languages 200 / 476 Lecture 5 Correspondence between different levels in LFG C-structure PP P NP Nwith friends PRED OBJ PRED NUM PLURAL 'friend' 'with' + PP P NP Nwith friends PRED OBJ PRED NUM PLURAL 'friend' 'with' 1 2 3 4 IA161 Syntactic Formalisms for Parsing Natural Languages 201 / 476 Lecture 5 Structural correspondence c-structures and f-structures represent different properties of an utterance How can these structures be associated properly to a particular sentence? Words and their ordering carry information about the linguistic dependencies in thesentence This is represented by the c-structure (licensed by a CFG) LFG proposes simple mechanisms that maps between elements from one structure and those of another: correspondence functions A function allows to map c-structures to f-structures Φ : N → F IA161 Syntactic Formalisms for Parsing Natural Languages 202 / 476 Lecture 5 Mapping the c-structure into the f-structure Since there is no isomorphic relationship between structure and function LFG assumes c-structure and f-structure The mapping between c-structure and f-structure is the core of LFG‘s descriptive power The mapping between c-structure and f-structure is located in the grammar (PS) rules c-structure f-structure S NP VP D N V NP D Nthe mouse admired the elephant SUBJ TENSE PRED OBJ PRED DEF NUM PERS PAST SUBJ OBJ PRED DEF NUM PERS 3 SG SG + 3 'mouse' + 'elephant' 'admire ' ? IA161 Syntactic Formalisms for Parsing Natural Languages 203 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 1: PS rules .. ...... Context-free phrase structure rules Annotated with functional schemata - EX.: mother node (without functional schemata) S NP VP ( SUBJ)= = daughter nodes (with (a list of) functional schemata) - EX.: NP NP NP = = VP V (NP) = ( SUBJ)= Note: is sometimes omitted! (this means nodes without functional schemata percolate their entire functional schema unchanged to the mother node = IA161 Syntactic Formalisms for Parsing Natural Languages 204 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 2: Lexicon entries .. ...... Lexicon entries consists of three parts: representation of the word, syntactic category, list of functional schemata Ex.: mouse N (↑PRED)=’mouse’ (↑PERS)=3 (↑NUM)=SG the D (↑DEF)=+ admire V (↑PRED)=’admire ⟨(↑ SUBJ)(↑ OBJ)⟩’ -ed Aff (↑TENSE)=PAST IA161 Syntactic Formalisms for Parsing Natural Languages 205 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 3: c-structure .. ...... Like the PS rules, each node in the tree is associated with a functional schemata With the functional schemata of the lexical entries at the leaves we obtain a complete c-structure ↔VP ↑=↓ NP (↑ SUBJ) =↓ S → S (↑ SUBJ) =↓ ↑=↓ NP VP S (↑ SUBJ) =↓ NP ↑=↓ VP ↑=↓ D ↑=↓ N ↑=↓ V (↑ OBJ) =↓ NP (↑ DEF) = + the (↑ PRED) = ’mouse’ (↑ PRED) = 3 (↑ PRED) = SG mouse (↑ PRED) = ’admire ⟨(↑ SUBJ)(↑ OBJ)⟩ ’ (↑ TENSE) = PAST admired ↑=↓ D (↑ DEF) = + the ↑=↓ N (↑ PRED) = ’elephant’ (↑ PRED) = 3 (↑ PRED) = SG elephant IA161 Syntactic Formalisms for Parsing Natural Languages 206 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 4: Co-indexation .. ...... An f-structure is assigned to each node of the c-structure Each of these f-structures obtains a name (f1 − fn) Nodes in the c-structure and associated f-structure are co-indexed, i.e. obtain the same name F-structure names f1 − fn can be chosen freely but they may not occur twice S (↑ SUBJ) =↓ NP ↑=↓ VP ↑=↓ D ↑=↓ N ↑=↓ V (↑ OBJ) =↓ NP (↑ DEF) = + the (↑ PRED) = ’mouse’ (↑ PRED) = 3 (↑ PRED) = SG mouse (↑ PRED) = ’admire ⟨(↑ SUBJ)(↑ OBJ)⟩ ’ (↑ TENSE) = PAST admired ↑=↓ D (↑ DEF) = + the ↑=↓ N (↑ PRED) = ’elephant’ (↑ PRED) = 3 (↑ PRED) = SG elephant f1 f2 f5 f3 f4 f6 f7 f8 f9 f1[ ] f2[ ] f3[ ] f4[ ] f5[ ] f6[ ] f7[ ] f8[ ] f9[ ] IA161 Syntactic Formalisms for Parsing Natural Languages 207 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 5: Metavariable biding .. ...... All meta-variables are replaced by the names of the f-structure representation S (↑ SUBJ) =↓ ↑=↓ NP VP f1 f2 f5 −→ S (f1SUBJ) = f2 f1 = f5 NP VP f1 f2 f5 S (f1SUBJ) = f2 NP f1 = f5 VP f2 = f3 D f1 = f4 N f5 = f6 V (f5OBJ) = f7 NP (f3DEF) = + the (f4PRED) = ’mouse’ (f4PRED) = 3 (f4PRED) = SG mouse (f6PRED) = ’admire ⟨(f6SUBJ)(f6OBJ)⟩ ’ (f6TENSE) = PAST admired f7 = f8 D (f8DEF) = + the f7 = f9 N (f9PRED) = ’elephant’ (f9PRED) = 3 (f9PRED) = SG elephant f1 f2 f5 f3 f4 f6 f7 f8 f9 IA161 Syntactic Formalisms for Parsing Natural Languages 208 / 476 Lecture 5 Mapping mechanism: 6 steps . ...... We introduce at this point the notion of functional equation By listing all functional equations from a c-structure we obtain the functional description, called f-description (f1SUBJ) = f2 (f6PRED) = ’admire ⟨(f6SUBJ)(f6OBJ)⟩ ’ f2 = f3 (f6TENSE) = PAST (f3DEF) = + (f5OBJ) = f7 f2 = f4 f7 = f8 (f4PRED) = ’mouse’ (f8DEF) = + (f4PERS) = 3 f7 = f9 (f4NUM) = SG (f9PRED) = ’elephant’ f1 = f5 (f9PERS) = 3 f5 = f6 (f9NUM) = SG Table : f-description IA161 Syntactic Formalisms for Parsing Natural Languages 209 / 476 Lecture 5 Mapping mechanism: 6 steps . STEP 6: From f-description to f-structure .. ...... Computation of an f-structure is based on the f-description For the derivation of f-structures from the f-description it is important that no information is lost and that no information will be added The derivation is done by the application of the functional equations List of functional equations a) simple equations of the form: fnA) = B b) f-equations of the form: fn = fm c) f-equations of the form: fnA) = fm → Functional equations with the same name are grouped into an f-structure of the same name IA161 Syntactic Formalisms for Parsing Natural Languages 210 / 476 Lecture 5 Application of the functional equation (a): (fnA) = B DEF =+ PRED ='mouse' PERS =3 NUM =SG PRED ='admire SUBJ OBJ ' TENSE =PAST DEF =+ PRED ='ELEPHANT' PERS =3 NUM =SG PRED 'mouse' PERS 3 NUM SG PRED 'admire TENSE PAST SUBJ OBJ ' DEF + DEF + PRED 'elephant' PERS 3 NUM SG Application of the functional equation (b): fn = fm PRED 'admire TENSE PAST SUBJ OBJ ' PRED 'mouse' PERS 3 NUM SG DEF + DEF + PRED 'elephant' PERS 3 NUM SG DEF + DEF + unification unification IA161 Syntactic Formalisms for Parsing Natural Languages 211 / 476 Lecture 5 Application of the functional equation (c): (fnA) = fm SUBJ OBJ PRED 'mouse' PERS 3 NUM SG DEF + PRED 'admire TENSE PAST SUBJ OBJ ' PRED 'elephant' PERS 3 NUM SG DEF + PRED 'mouse' PERS 3 NUM SG DEF + PRED 'elephant' PERS 3 NUM SG DEF + SUBJ OBJ unification unification IA161 Syntactic Formalisms for Parsing Natural Languages 212 / 476 Lecture 5 . STEP 1: lexical entries .. ...... made: V (↑PRED)=’MAKE⟨SUBJ,OBJ,XCOMP⟩’ (↑XCOMP SUBJ)=(↑OBJ) (↑TENSE)=SIMPLEPAST gave: V (↑PRED)=’GIVE⟨SUBJ,OBJ,OBJ2⟩’ (↑TENSE)=SIMPLEPAST had said: V (↑PRED)=’SAY⟨SUBJ,OBJ⟩’ (↑TENSE)=PASTPERFECT the: D (↑PRED)=’THE’ (↑SPECTYPE)=DEF about: P (↑PRED)=’ABOUT⟨OBJ⟩’ which: N (↑PRED)=’PRO’ (↑PRONTYPE)=REL John’s: D (↑PRED)=’JOHN’ (↑SPECTYPE)=POSS many: D (↑PRED)=’MANY’ (↑SPECTYPE)=QUANT things: N (↑PRED)=’THINGS’ (↑NUM)=PLURAL . STEP 2: c-structure .. ...... a. S → NP (↑ SUBJ) =↓ VP ↑=↓ b. NP → { A N ↑=↓ ↑=↓ } c. VP → V ↑=↓ NP (↑ SUBJ) =↓ V (↑ XCOMP) =↓ (↑ XCOMP PRED) = ’be ⟨SUBJ, PREDIC⟩ ’ d. V → NP (↑ PREDIC) =↓ IA161 Syntactic Formalisms for Parsing Natural Languages 213 / 476 Lecture 5 . STEP 3: f-structure .. ...... 'John made Peter angry' S SUBJ = = NP = N John VP = OBJ = XCOMP = made V NP = N Peter V PREDIC = NP = A angry = = SUBJ = = OBJ = = XCOMP = XCOMP PREDIC ='be SUBJ, PRED ' PREDIC = = . STEP 4: unification .. ...... PRED 'make 'SUBJ, XCOMP TENSE simplepast SUBJ , PRED 'John' , PRED 'Peter'OBJ PRED 'be SUBJ, PRED ' SUBJ PREDIC , PRED 'angry' XCOMP , , IA161 Syntactic Formalisms for Parsing Natural Languages 214 / 476 Lecture 6 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 215 / 476 Lecture 6 . ...... Parsing with CCG IA161 Syntactic Formalisms for Parsing Natural Languages 216 / 476 Lecture 6 Outline 1 A-B categorial system 2 Lambek calculus 3 Extended Categorial Grammar Variation based on Lambek calculus Abstract Categorial Grammar, Categorial Type Logic Variation based on Combinatory Logic Combinatory Categorial Grammar (CCG) Multi-modal Combinatory Categorial Grammar IA161 Syntactic Formalisms for Parsing Natural Languages 217 / 476 Lecture 6 Categorial Grammar is : a lexicalized theory of grammar along with other theories of grammar such as HPSG, TAG, LFG, … : linguistically and computationally attractive −→ language invariant combination rules, high efficient parsing IA161 Syntactic Formalisms for Parsing Natural Languages 218 / 476 Lecture 6 Main idea in CG and application operation All natural language consists of operators and of operands. Operator (functor) and operand (argument) Application: (operator(operand)) Categorial type: typed operator and operand IA161 Syntactic Formalisms for Parsing Natural Languages 219 / 476 Lecture 6 1. A-B categorial system . ...... The product of the directional adaptation by Bar-Hillel (1953) of Ajdukiewicz’s calculus of syntactic connection (Ajdukiewicz, 1935) Definition 1 (AB categories). Given A, a finite set of atomic categories, the set of categories C is the smallest set such that: A ⊆ C (X\Y), (X/Y) ∈ C if X, Y ∈ C IA161 Syntactic Formalisms for Parsing Natural Languages 220 / 476 Lecture 6 1. A-B categorial system Categories (type): primitive categories and derivative categories Primitive: S for sentence, N for nominal phrase Derivative: S/N, N/N, (S\N)/N, NN/N, S/S . . . Forward(>) and backward (<) functional application a. X/Y Y ⇒ X (>) b. Y X\Y ⇒ X (<) IA161 Syntactic Formalisms for Parsing Natural Languages 221 / 476 Lecture 6 1. A-B categorial system Calculus on types in CG are analogue to algebraic operations . ...... x/y y → x ≈ 3/5 ∗ 5 = 3 Brazil defeated Germany n (s\n)/n n > s\n < s IA161 Syntactic Formalisms for Parsing Natural Languages 222 / 476 Lecture 6 1. A-B categorial system Applicative tree of Brazil defeated Germany defeated operator Germany operand Brazil operand @ defeated (Germany) @ ((defeated(Germany))Brazil) IA161 Syntactic Formalisms for Parsing Natural Languages 223 / 476 Lecture 6 Limitation of AB system 1 Relative construction a. teami that ti defeated Germany b. teami that Brazil defeated ti a’. that (n\n)/(s\n) team [that](n\n)/(s\n) [defeated Germany]s\n b’. that (n\n)/(s/n) team [that](n\n)/(s/n) [Brazil defeated]s/n (?) team that (n\n)/(s/n) Brazil n defeated (s\n)/n 3 Many others complex phenomena Coordination, object extraction, phrasal verbs, ... 4 AB’s generative power is too weak – context-free IA161 Syntactic Formalisms for Parsing Natural Languages 224 / 476 Lecture 6 2. Lambek calculus (Lambek, 1958, 1961) the calculus of syntactic types still context-free The axioms of Lambek calculus are the following: 1 x → x 2 (xy)z → x(yz) → (xy)z (the axioms 1, 2 with inference rules, 3, 4, 5) 3 If xy → z then x → z/y, if xy → z then y → x\z; 4 If x → z/y then xy → z, if y → x\z then xy → z; 5 If x → y and y → z then x → z. IA161 Syntactic Formalisms for Parsing Natural Languages 225 / 476 Lecture 6 2. Lambek calculus (Lambek, 1958, 1961) The rules obtained from the previous axioms are the following: 1 Hypothesis: if x and y are types, then x/y and y\x are types. 2 Application rules : (x/y)y → x, y(y\x) → x ex: Poor John works. 3 Associativity rule : (x\y)/z ↔ x\(y/z) ex: John likes Jane. 4 Composition rules : (x/y)(y/z) → x/z, (x\y)(y\z) → x\z ex: He likes him. s/(n\s)n\s/n 5 Type-raising rules : x → y/(x/y), x → (y/x)\y IA161 Syntactic Formalisms for Parsing Natural Languages 226 / 476 Lecture 6 3. Combinatory Categorial Grammar Developed originally by M. Steedman (1988, 1990, 2000, ...) Combinatory Categorial Grammar (CCG) is a grammar formalism equivalent to Tree Adjoining Grammar, i.e. it is lexicalized it is parsable in polynomial time (See Vijay-Shanker and Weir, 1990) it can capture cross-serial dependencies Just like TAG, CCG is used for grammar writing CCG is especially suitable for statistical parsing IA161 Syntactic Formalisms for Parsing Natural Languages 227 / 476 Lecture 6 3. Combinatory Categorial Grammar several of the combinators which Curry and Feys (1958) use to define the λ-calculus and applicative systems in general are of considerable syntactic interest (Steedman, 1988) The relationships of these combinators to terms of the λ-calculus are defined by the following equivalences (Steedman, 2000b): a.Bfg ≡ λx.f(gx) ... composition b.Tx ≡ λf.fx ... type-raising c.Sfg ≡ λx.fx(gx) ... substitution IA161 Syntactic Formalisms for Parsing Natural Languages 228 / 476 Lecture 6 CCG categories Atomic categories: S, N, NP, PP, TV… Complex categories are built recursively from atomic categories and slashes Example complex categories for verbs: intransitive verb: S\NP walked transitive verb: (S\NP)/NP respected ditransitive verb: ((S\NP)/NP)/NP gave IA161 Syntactic Formalisms for Parsing Natural Languages 229 / 476 Lecture 6 Lexical categories in CCG An elementary syntactic structure – a lexical category – is assigned to each word in a sentence, eg: walked: S\NP ‘give me an NP to my left and I return a sentence’ Think of the lexical category for a verb as a function: NP is the argument, S the result, and the slash indicates the direction of the argument IA161 Syntactic Formalisms for Parsing Natural Languages 230 / 476 Lecture 6 The typed lexicon item The CCG lexicon assigns categories to words, i.e. it specifies which categories a word can have. Furthermore, the lexicon specifies the semantic counterpart of the syntactic rules, e.g.: love (S\NP)/NPλxλy.loves ′ xy Combinatory rules determine what happens with the category and the semantics on combination IA161 Syntactic Formalisms for Parsing Natural Languages 231 / 476 Lecture 6 The typed lexicon item Attribution of types to lexical items: examples Predicate ex: is as an identificator of nominal as an operator of predication from a nominal (S\NP)/NP from an adjective (S\NP)/(N/N) from an adverb (S\NP)/(S\NP)\(S\NP) from a preposition (S\NP)/((S\NP)\(S\NP)/NP) ex: verbs unary (S\NP) binary (S\NP)/NP ternary (S\NP)/NP/NP IA161 Syntactic Formalisms for Parsing Natural Languages 232 / 476 Lecture 6 The typed lexicon item Adverbs Adverb of verb (S\NP)/(S\NP) (S\NP)/NP/(S\NP)/NP Adverb of adverb (S\NP)/(S\NP)/(S\NP)/(S\NP) (S\NP)/NP/(S\NP)/NP/(S\NP)/NP/(S\NP)/NP Adverb of adjective (N/N)/(N/N) (N\N)/(N\N) Adverb of proposition S/S . ...... Adverb: operator of determination of type (X/X) IA161 Syntactic Formalisms for Parsing Natural Languages 233 / 476 Lecture 6 The typed lexicon item Preposition Prep. 1: constructor of adverbial phrase (S\NP)\(S\NP)/NP (S/S)/NP (S/S)/N Prep. 2: constructor of adjectival phrase (N\N)/NP (N\N)/N . ...... Preposition: constructor of determination of type (X/X) IA161 Syntactic Formalisms for Parsing Natural Languages 234 / 476 Lecture 6 Dictionary of typed words Syntactic categories Syntactic types Lexical entries Nom. N Olivia, apple… Completed nom. NP an apple, the school Pron. NP She, he… Adj. (N/N), (N\N) pretty woman,… Adv. (N/N)/(N/N), very delicious,… (S\NP)\(S\NP)… Vb (S\NP), (S\NP)/NP… run, give… Prep. (S\NP)\(S\NP)/NP run in the park, (NP\NP)/NP… book of John, … Relative (S\NP)/S… I believe that… IA161 Syntactic Formalisms for Parsing Natural Languages 235 / 476 Lecture 6 Combinatorial categorial rules Functional application (>, <) Functional composition (> B, < B) Type-raising (< T, > T) Distribution (< S, > S) Coordination (< Φ, > Φ) IA161 Syntactic Formalisms for Parsing Natural Languages 236 / 476 Lecture 6 Functional application (FA) X/Y : f Y : a⇒ X : fa(forward functional application, >) Y : a X\Y : f⇒ X : fa(backward functional application, <) Combine a function with its argument: John likes Mary ((likes (Mary))John) S S\NP (likes (Mary)) NP (S\NP)/NP NP Mary sleeps (sleeps (Mary)) S NP S\NP Direction of the slash indicates position of the argument with respect to the function IA161 Syntactic Formalisms for Parsing Natural Languages 237 / 476 Lecture 6 Derivation in CCG The combinatorial rule used in each derivation step is usually indicated on the right of the derivation line Note especially what happens with the semantic information John loves Mary NP : John′ (S\NP)/NP : λxλy.loves′xy NP : Mary′ > S\NP : λy.loves′Mary′y < S : loves′Mary′John′ IA161 Syntactic Formalisms for Parsing Natural Languages 238 / 476 Lecture 6 Function composition (FC) Generalized forward composition (> Bn) X/Y : f Y/Z : g ⇒B X/Z : λx.f(gx) (> B) Functional composition composes two complex categories (two functions): (S\NP)/PP (PP/NP) ⇒B (S\NP)/NP S/(S\NP) (S\NP)/NP ⇒B S/NP S > S/NP > B S/(S\NP) > T NP birds (S\NP)/NP like NP bugs IA161 Syntactic Formalisms for Parsing Natural Languages 239 / 476 Lecture 6 Function composition (FC) Generalized backward composition (< Bn) Y\Z : f X\Y : g ⇒B X\Z : x.f(gx) (< B) The referee gave (s/np)/np Unsal np a card np and (X\X)/X Rivaldo np the ball np s\((s/np)/np < s IA161 Syntactic Formalisms for Parsing Natural Languages 240 / 476 Lecture 6 Type-raising (T) Forward type-raising (> T) X : a ⇒ T/(T\X) : λf.fa (> T) Type-raising turns an argument into a function (e.g. for case assignment) NP ⇒ S/(S\NP) (nominative) birds NP fly S\NP < S birds NP > T S/(S\NP) > S fly S\NP This must be used e.g. in the case of WH-questions IA161 Syntactic Formalisms for Parsing Natural Languages 241 / 476 Lecture 6 Example of functional composition (> B) and type-raising (T) team n that (n\n)/(s/np) I np >T s/(s\np) >B s/s thought (s\np)/s that s/s Brazil np >T s/(s\np) >B s/np defeated (s\np)/np >B s/np >B s/np > n\n < n IA161 Syntactic Formalisms for Parsing Natural Languages 242 / 476 Lecture 6 Example of functional composition (> B) and type-raising (T) Backward type-raising (< T) X : a ⇒ T\(T/X) : λf.fa (< T) Type-raising turns an argument into a function (e.g. for case assignment) NP ⇒ (S\NP)\((S\NP)/NP) (accusative) The referee gave (s/np)/np Unsal np s\((s/np)/np) < s IA161 Syntactic Formalisms for Parsing Natural Languages 243 / 476 Lecture 6 Coordination (&) X CONJ X ⇒Φ X (Coordination(Φ)) give (VP/NP)/NP a dog VP\((VP/NP)/NP) < VP IA161 Syntactic Formalisms for Parsing Natural Languages 244 / 476 Lecture 6 Substitution (S) Forward substitution (> S) (X/Y)/Z Y/Z ⇒S X/Z Application to parasitic gap such as the following: a. team that I persuaded every detractor of to support team that (n\n)/(s/np) I np >T s/(s\np) persuaded ((s\np)/(s\np))/np every detractor of np/np to support (s\np)/np >B ((s\np)/(s\np))/np >S (s\np)/np >B s/np > n\n IA161 Syntactic Formalisms for Parsing Natural Languages 245 / 476 Lecture 6 Substitution (S) Backward crossed substitution (< S×) Y/Z (X\Y)/Z ⇒S X/Z Application to parasitic gap such as the following: a. John watched without enjoying the game between Germany and Paraguay. b. game that John watched without enjoying . ...... game that John [watched](s\np)/np [without enjoying]((s\np)\(s\np))/np game that (n\n)/(s/np) John np >T s/(s\np) watched (s\np)/np without enjoying ((s\np)\(s\np))/np B s/(s\np) > n\n IA161 Syntactic Formalisms for Parsing Natural Languages 246 / 476 Lecture 6 Limit on possible rules The Principle of Adjacency: Combinatory rules may only apply to entities which are linguistically realised and adjacent. The Principle of Directional Consistency: All syntactic combinatory rules must be consistent with the directionality of the principal function. ex: X\Y Y ̸=> X The Principle of Directional Inheritance: If the category that results from the application of a combinatory rule is a function category, then the slash defining directionality for a given argument in that category will be the same as the one defining directionality for the corresponding arguments in the input functions. ex: X/Y Y/Z ̸=> X\Z. IA161 Syntactic Formalisms for Parsing Natural Languages 247 / 476 Lecture 6 Semantic in CCG CCG offers a syntax-semantics interface. The lexical categories are augmented with an explicit identification of their semantic interpretation and the rules of functional application are accordingly expanded with an explicit semantics. Every syntactic category and rule has a semantic counterpart. The lexicon is used to pair words with syntactic categories and semantic interpretations: love (S\NP)/NP ⇒ λxλy.loves′xy IA161 Syntactic Formalisms for Parsing Natural Languages 248 / 476 Lecture 6 Semantic in CCG The semantic interpretation of all combinatory rules is fully determined by the Principle of Type Transparency: Categories: All syntactic categories reflect the semantic type of the associated logical form. Rules: All syntactic combinatory rules are type-transparent versions of one of a small number of semantic operations over functions including application, composition, and type-raising. IA161 Syntactic Formalisms for Parsing Natural Languages 249 / 476 Lecture 6 Semantic in CCG proved := (S\NP3s)/NP : λxλy.prove′xy the semantic type of the reduction is the same as its syntactic type, here functional application. Marcel NP3sm : marcel′ proved (S\NP3s)/NP : λxλy.prove′xy completeness NP : completeness′ > S\NP3s : λy.prove′completeness′y < S : prove′completeness′marcel′ IA161 Syntactic Formalisms for Parsing Natural Languages 250 / 476 Lecture 6 Semantic in CCG CCG with semantics : Mary will copy and file without reading these articles Mary will S/VP copy VP/NP and CONJ file VP/NP without (VP\VP)/VPing reading VPing/NP these articles NP :p.Mary’ λp.will’ :copy’ :and’ :file’ λp.λq.without’pq :read’ :articles’ >B (VP\VP)/VPing :λx.λq.without’(read’ x)q VP/NP :λx.and’(without’(read’x)(file’x))(copy’x) < VP :and’(without’)(read’articles’)(file’articles’))(copy’articles’) > S :will’(and’(without’)(read’articles’)(file’articles’))(copy’articles’))mary’ IA161 Syntactic Formalisms for Parsing Natural Languages 251 / 476 Lecture 6 Parsing a sentence in CCG Step 1: tokenization Step 2: tagging the concatenated lexicon Step 3: calculate on types attributed to the concatenated lexicons by applying the adequate combinatorial rules eliminate the applied combinators (we will see how to do on next week) Step 4: finding the parsing results presented in the form of an operator/operand structure (predicate -argument structure) IA161 Syntactic Formalisms for Parsing Natural Languages 252 / 476 Lecture 6 Parsing a sentence in CCG Example: I requested and would prefer musicals STEP 1 : tokenization/lemmatization → ex) POS Tagger, tokenizer, lemmatizer a. I-requested-and-would-prefer-musicals b. I-request-ed-and-would-prefer-musical-s STEP 2 : tagging the concatenated expressions → ex) Supertagger, Inventory of typed words I NP Requested (S\NP)/NP And CONJ Would (S\NP)/VP Prefer VP/NP musicals NP IA161 Syntactic Formalisms for Parsing Natural Languages 253 / 476 Lecture 6 Parsing a sentence in CCG STEP 3 : categorial calculus c. apply the coordination rules Coordination: (< & >) X conj X ⇒ X b. apply the functional composition rules Forward Composition: (> B) X/Y : f Y/Z : g ⇒ X/Z : Bfg a. apply the type-raising rules Subject Type-raising (> T) NP : a ⇒ T/(T\NP) : Ta 7/ S 6/ S/NP NP (>) 5/ S/(S\NP) (S\NP)/NP NP (>B) 4/ S/(S\NP) (S\NP)/NP NP (> Φ) 3/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/NP NP (>B) 2/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/VP VP/NP NP (>T) 1/ NP (S\NP)/NP CONJ (S\NP)/VP VP/NP NP I- requested- and- would- prefer- musicals IA161 Syntactic Formalisms for Parsing Natural Languages 254 / 476 Lecture 6 Parsing a sentence in CCG STEP 4 : semantic representation (predicate-argument structure) 7/S: and’(will’(prefer’ musicals’) i’)(request’ musicals’ i’) 6/ :λy.and’(would’(prefer’ musicals’)y)(request’ musicals’ y) 5/ : λxλy.and’(will’(prefer’x)y)(request’xy) 4/ : λxλy.and’(will’(prefer’x)y)(request’xy) 3/ : λx.λy.will’(prefer’x)y 2/ :λf.f I’ 1/ :i’ :request’ :and’ : will’ :prefer’ : musicals’ I requested and would prefer musicals IA161 Syntactic Formalisms for Parsing Natural Languages 255 / 476 Lecture 6 Variation of CCG : Multi-modal CCG (Baldridge, 2002) Modalized CCG system Combination of Categorial Type Logic (CTL, Morrill, 1994; Moortgat, 1997) into the CCG (Steedman, 2000) Rules restrictions by introducing the modalities: *, x, •, ♢ Modalized functional composition rules (> B) X/♢Y Y/♢Z ⇒ X/♢Z (< B) X\♢Y Y\♢Z ⇒ X\♢Z Invite you to read the paper “Multi-Modal CCG” of (Baldridge and M.Kruijff, 2003 ) IA161 Syntactic Formalisms for Parsing Natural Languages 256 / 476 Lecture 6 The positions of several formalisms on the Chomsky hierarchy Turing complete Context-sensitive Middly context-sensitive Context-free Unrestricted CTL CTL with Non-expanding Rules Multiset-CCG CCG TAG AB CTL Base Logic Lambek Calculus IA161 Syntactic Formalisms for Parsing Natural Languages 257 / 476 Lecture 7 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 258 / 476 Lecture 7 . ...... Parsing with HPSG IA161 Syntactic Formalisms for Parsing Natural Languages 259 / 476 Lecture 7 Overview on syntactic formalisms Unification based grammars : HPSG, LFG, TAG, UCG... Dependency based grammars : Tesnière model; Meaning-Text of Mel’čuk... IA161 Syntactic Formalisms for Parsing Natural Languages 260 / 476 Lecture 7 Heritage of HPSG GPSG – Generalized Phrase-Structure Grammar (Gerald Gazdar) linear order/hierarchy order feature structure for representation of information LFG Lexicon contains Lexical rules CG Subcategorization IA161 Syntactic Formalisms for Parsing Natural Languages 261 / 476 Lecture 7 Key points of HPSG Monostratal theory without derivation Sharing a given information without movement and transformation One representation for different levels of analysis : phonology, syntax, semantic Constraint-based analysis Unification of given information Computational formalism IA161 Syntactic Formalisms for Parsing Natural Languages 262 / 476 Lecture 7 Syntactic representation in HPSG Typed feature structure consists of a couple “attribute/value” the types are organized into a hierarchy ex: sign>phrase, case>nominative feature structure is a directed acyclic graph (DAG), with arcs representing features going between values IA161 Syntactic Formalisms for Parsing Natural Languages 263 / 476 Lecture 7 Features Basic element of structure in HPSG Should be appropriate to a type Most frequently used features PHON SYNSEM LOC/NON-LOC CAT CONTEXT CONTENT HEAD SUJ COMPS S-ARG IA161 Syntactic Formalisms for Parsing Natural Languages 264 / 476 Lecture 7 Types Types are attributed to features -> typed features sign synsem head phrase content Index .... Each of these feature values is itself a complex object: The type sign has the features PHON and SYNSEM appropriate for it The feature SYNSEM has a value of type synsem This type itself has relevant features (LOCAL and NONLOCAL) IA161 Syntactic Formalisms for Parsing Natural Languages 265 / 476 Lecture 7 Types sign is the basic type in HPSG used to describe lexical items (of type word) and phrases (of type phrase). All signs carry the following two features: PHON encodes the phonological representation of the sign SYNSEM syntax and semantics sign   PHON list(phon-string) SYNSEM synsem   IA161 Syntactic Formalisms for Parsing Natural Languages 266 / 476 Lecture 7 Types In attribute-value matrix (AVM) form, here is the skeleton of an object:               sign PHON list(PHON) SYNSEM      synsem LOCAL local NON-LOCAL non-local      DTRS list(SIGN)               IA161 Syntactic Formalisms for Parsing Natural Languages 267 / 476 Lecture 7 Structure of signs in HPSG synsem introduces the features LOCAL and NONLOCAL local introduces CATEGORY (CAT), CONTENT (CONT) and CONTEXT(CONX) non-local will be discussed in connection with unbounded dependencies category includes the syntactic category and the grammatical argument of the word/phrase IA161 Syntactic Formalisms for Parsing Natural Languages 268 / 476 Lecture 7 Description of an object in HPSG: lexical sign and phrasal sign sing [ PHON list(phon-string) SYNSEM synsem ] word phrase [ DTRS constituent-struc ] synsem   LOCAL local NON-LOCAL non-local   local      CATEGORY category CONTENT content CONTEXT context      category      HEAD head VAL ... ... ...      IA161 Syntactic Formalisms for Parsing Natural Languages 269 / 476 Lecture 7 CATEGORY CATEGORY encode the sign’s syntactic category Given via the feature [HEAD head], where head is the supertype for noun, verb, adjective, preposition, determiner, marker; each of these types selects a particular set of head features Given via the feature [VALENCE ...], possible to combine the signs with the other signs to a larger phrases     SYNSEM|LOC|CAT|VALENCE valence     SUBJECT list(synsem) SPECIFIER list(synsem) COMPLEMENTS list(synsem)         IA161 Syntactic Formalisms for Parsing Natural Languages 270 / 476 Lecture 7 Sub-categorization of head type vform finite infinitive base gerund present-part. past-part. passive-part. case nominative accusative pform of to ... IA161 Syntactic Formalisms for Parsing Natural Languages 271 / 476 Lecture 7 Description of an object in HPSG sing [ PHON list(phon-string) SYNSEM synsem ] word phrase [ DTRS constituent-struc ] synsem   LOCAL local NON-LOCAL non-local   local      CATEGORY category CONTENT content CONTEXT context      category      HEAD head VAL ... ... ...      IA161 Syntactic Formalisms for Parsing Natural Languages 272 / 476 Lecture 7 Semantic representation: CONTENT (& CONTEXT) feature Semantic interpretation of the sign is given as the value to CONTENT nominal-object: an individual/entity (or a set of them), associated with a referring index, bearing agreement features → INDEX, RESTR Parameterized-state-of-affairs (psoa): a partial situate; an event relation along with role names for identifying the participants of the event→ BACKGR quantifier: some, all, every, a, the, . . . Note: many of these have been reformulated by “Minimal Recursion Semantics (MRS)” which allows underspecification of quantifier scopes. IA161 Syntactic Formalisms for Parsing Natural Languages 273 / 476 Lecture 7 Sub-categorization of content type content ... psoa nom-obj    INDEX index RESTR set(psoa)    laugh‘ [ LAUGHER ref ] give‘       GIVER ref GIVEN ref GIFT ref       drink‘    DRINKER ref DRUNKEN ref    think‘    THINKER ref THOUGHT psoa    . Note: .. ...... Semantic restriction on the index are represented as a value of RESTR. RESTR is an attribute of a nominal object. The value of RESTR is a set of psoa. In turn, RESTR has the attribute of REL whose value can either be referential indices or psoas. IA161 Syntactic Formalisms for Parsing Natural Languages 274 / 476 Lecture 7 Sub-categorization of index type index     PERSON person NUMBER number GENDER gender     referential there it person first second third number singular plural pgender masculine feminine neuter IA161 Syntactic Formalisms for Parsing Natural Languages 275 / 476 Lecture 7 Lexical input of She HEAD VALENCE INDEX RESTR BACKGR noun val ref psoa CASE SUBJ COMPS SPR PER NUM GEND RELN INST nom 3rd sing fem female 1 1 cat ppro context CATEGORY CONTENT CONTEXT LOCALSYNSEM PHON she localsynsemword IA161 Syntactic Formalisms for Parsing Natural Languages 276 / 476 Lecture 7 Lexical input of She sign word phrase PHON SYNSEM list(phon-string) synsem DTRS constituent-struc Each phrase has a DTRS attribute which has a constituent-structure value This DTRS value corresponds to what we view in a tree as daughters (with additional grammatical role information, e.g. adjunct, complement, etc.) By distinguishing different kinds of constituent-structures, we can define different kinds of constructions in a language IA161 Syntactic Formalisms for Parsing Natural Languages 277 / 476 Lecture 7 Structure of phrase head-adj-struc ADJ-DTR ADJ-DTR sign <> head-filler-struc FILL-DTR FILL-DTR sign <> head-mark-struc MARK-DTR MARK-DTR sign <> head-spr-struc SPR-DTR SPR-DTR <> head-subj-struc SUBJ-DTR SUBJ-DTR <> head-comps-struc COMPS-DTR COMP-DTR <> constituent-struc head-struc coord-struc HEAD-DTR CONJ-DTRS CONJUNCTION-DTR word set(sign)sign ... IA161 Syntactic Formalisms for Parsing Natural Languages 278 / 476 Lecture 7 head-subject/complement structure SYNSEM | LOC | CAT DTRS HEAD VAL SUBJ COMPS head-subj-struc PHON SYNSEM SYNSEM | LOC | CAT DTRS HEAD VAL SUBJ COMPS head-comps-struc SYNSEM | LOC | CAT HEAD VAL SUBJ COMPS PHON PHON SYNSEM 3 3 1 1 1 2 VFORM fin3 verb 2 S H H C IA161 Syntactic Formalisms for Parsing Natural Languages 279 / 476 Lecture 7 Questions! (1) How exactly did the last example work? drink has head information specifying that it is a finite verb and subcategories for a subject and an object The head information gets percolated up (the HEAD feature principle) The valence information gets “checked off” as one moves up in the tree (the VALENCE principle) Such principles are treated as linguistic universals in HPSG IA161 Syntactic Formalisms for Parsing Natural Languages 280 / 476 Lecture 7 HEAD-feature principle The value of the HEAD feature of any headed phrase is token-identical with the HEAD value of the head daughter 1 DTRS head-struc SYNSEM | LOC | CAT | HEAD DTRS | HEAD-DTR | SYNSEM | LOC | CAT | HEADphrase 1 IA161 Syntactic Formalisms for Parsing Natural Languages 281 / 476 Lecture 7 VALENCE principle In a headed phrase, for each valence feature F, the F value of the head daughter is the concatenation of the phrase’s F value with the list of F-DTR’s SYNSEM (Pollard and Sag, 1994:348) phrase SS | LOC | CAT | VAL SUBJ COMPS [a] [b] DTRS HEAD-DTR SUBJ-DTR COMP-DTR SS | LOC | CAT | VAL SUBJ [1] [a] COMPS [2],...,[n] [b] SS [1] SS [2] ,..., ss[n] Note: Valence Principle constrains the way in which information is shared between phrases and their head daughters. F can be any one of SUBJ, COMPS, SPR When the F-DTR is empty, the F valence feature of the head daughter will be copied to the mother phrase IA161 Syntactic Formalisms for Parsing Natural Languages 282 / 476 Lecture 7 Questions! (2) Note that agreement is handled neatly, simply by the fact that the SYNSEM values of a word’s daughters are token-identical to the items on the VALENCE lists How exactly do we decide on a syntactic structure? Why the subject is checked off at a higher point in the tree? IA161 Syntactic Formalisms for Parsing Natural Languages 283 / 476 Lecture 7 Immediate Dominance (ID) Principle Every headed phrase must satisfy exactly one of the ID schemata The exact inventory of valid ID schemata is language specific We will introduce a set of ID schemata for English IA161 Syntactic Formalisms for Parsing Natural Languages 284 / 476 Lecture 7 Immediate Dominance (ID) Schemata DTRS head-struc phrase DTRS DTRS DTRS DTRS SS | LOC | CAT | VAL | COMPS head-spr-struc head-marker-struc marker head-adj-struc MARK-DTR | SS | LOC | CAT | HEAD ADJ-DTR | SS | LOC | CAT | HEAD | MOD HEAD-DTR | SS (head-subject) head-comps-struc (head-complement) (head-specifier) (head-marker) (head-adjunct) ... SS | LOC | CAT | VAL | COMPS DTRS head-subj-struc 1 1 IA161 Syntactic Formalisms for Parsing Natural Languages 285 / 476 Lecture 7 head-adjunct structure PHON SS | LOC | CAT DTRS HEAD VAL | SPR head-adj-struc PHON SS | LOC | CAT | HEAD PRD - MOD adj PHON SS LOC | CAT HEAD VAL | SPR noun LOC | CAT | HEAD det 1 2 2 1 3 3 A H IA161 Syntactic Formalisms for Parsing Natural Languages 286 / 476 Lecture 7 Semantic principle The CONTENT value of a headed phrase is token identical to the CONTENT value of the semantic head daughter The semantic head daughter is identified as The ADJ-DTR in a head-adjunct phrase The HEAD-DTR in other headed phrases DTRS phrase head-struc SYNSEM | LOC | CONT DTRS SYNSEM | LOC | CONT DTRS head-adj-struc ADJ-DTR| SYNSEM | LOC | CONT HEAD-DTR | SYNSEM | LOC | CONT (head-adjunct) (non-head-adjunct) 1 1 1 1 head-adj-struc IA161 Syntactic Formalisms for Parsing Natural Languages 287 / 476 Lecture 7 Example 2 Kim likes bagels word PHON SYNSEM Kim LOCAL CAT CONT HEAD SUBJ SPR COMPS ARG-ST INDEX KEY RELS noun ARG 3sg named_rel INST ARG Kim 1 1 2 2 IA161 Syntactic Formalisms for Parsing Natural Languages 288 / 476 Lecture 7 Example 2 Kim likes(1) bagels nowARG2 ARG1 t_overlap_rel ARG2 ARG1 EVENT like_rel RELS KEY INDEX content CONT ARG-ST INDEX content CONT COMPS SPR SUBJ HEAD noun category CAT local LOCAL synsem INDEX content CONT COMPS SPR SUBJ HEAD noun 3sgARG category CAT local LOCAL synsem fin verb FORM HEAD SUBJ SPR COMPS CAT LOCAL SYNSEM category local synsem word PHON likes 3 3 4 5 .6 6 3 5 2 1 2, 4 1 IA161 Syntactic Formalisms for Parsing Natural Languages 289 / 476 Lecture 7 Example 2 Kim likes(2) bagels word PHON SYNSEM likes LOCAL CAT CONT HEAD SUBJ SPR COMPS ARG-ST INDEX RELS 3sgNP NP like_rel EVENT ARG1 ARG2 ARG2 ARG1 t-overlap_rel 3 now 3 4 5 , 3 1 4 52 1 2 FORM verb fin , IA161 Syntactic Formalisms for Parsing Natural Languages 290 / 476 Lecture 7 Example 2 Kim likes bagels word PHON SYNSEM bagels LOCAL CAT CONT HEAD SUBJ SPR COMPS ARG-ST INDEX KEY RELS INST bagel_rel DetP AGR pl noun 1 2 1 2 3 3 IA161 Syntactic Formalisms for Parsing Natural Languages 291 / 476 Lecture 7 Example 2 head-complement schema head-comps-ph PHON SYNSEM HEAD-DTR NON-HEAD-DTRS LOCAL PHON SYNSEM PHON SYNSEM RELS PHON SYNSEM RELS CAT CONT LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS HEAD SUBJ SPR COMPS INDEX KEY RELS sts Z... N M... D , ... E 2 3 F 1 A B E C 2 3 F M Z... 1 A B C D N... IA161 Syntactic Formalisms for Parsing Natural Languages 292 / 476 Lecture 7 Example 2 head-complement schema headed by likes head-comps-ph PHON SYNSEM HEAD-DTR NON-HEAD-DTRS LOCAL CAT CONT PHON SYNSEM PHON SYNSEM LOCAL | CONT | RELS LOCAL CAT CONT RELS like_rel EVENT ARG1 ARG2 t-overlap_rel ARG1 ARG2 now KEY INDEX COMPS NP SPR SUBJ HEAD NP 3sg likes RELS KEY INDEX COMPS SPR SUBJ HEAD F6 D 22 4 5 ,3E 2 3 6 5 B A 4 verb FORM fin1 A 2 3 E F 1 A B C D IA161 Syntactic Formalisms for Parsing Natural Languages 293 / 476 Lecture 7 Example 2 Kim likes bagels head-comps-ph PHON SYNSEM LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS FORM NP EVENT ARG1 ARG2 ARG1 ARG2 INST likes, bagels verb fin 3sg like_rel t-overlap_rel now bagel_rel 52 2 4 5 , , 3 2 3 4 IA161 Syntactic Formalisms for Parsing Natural Languages 294 / 476 Lecture 7 Example 2 head-subject schema head-subj-ph PHON SYNSEM HEAD-DTR NON-HEAD-DTRS LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS PHON SYNSEM LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS FORM verb fin PHON SYNSEM LOCAL | CONT | RELS F4 B 2 3 E D C 4 1 A 2 3 E F 1 C D B A IA161 Syntactic Formalisms for Parsing Natural Languages 295 / 476 Lecture 7 Example 2 head-subject schema headed by likes bagels NON-HEAD-DTRS PHON SYNSEM LOCAL | CONT | RELS F4 B LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS FORM NP EVENT ARG1 ARG2 ARG1 ARG2 INST likes, bagels verb fin 3sg like_rel t-overlap_rel now bagel_rel 2 2 5 , , 3 2 3 4 head-subj-ph PHON SYNSEM HEAD-DTR LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS 2 3 E F 1 C D B A 6 6 E D C 5 1 PHON SYNSEM A IA161 Syntactic Formalisms for Parsing Natural Languages 296 / 476 Lecture 7 Example 2 Kim likes bagels head-subj-ph PHON Kim, likes, bagels SYNSEM LOCAL CAT CONT HEAD SUBJ SPR COMPS INDEX KEY RELS verb finFORM named_rel INST ARG Kim like_rel EVENT ARG1 ARG2 t-overlap_rel ARG1 ARG2 now bagel_rel INST 6 2 2 5 6 5 2 3 , , , IA161 Syntactic Formalisms for Parsing Natural Languages 297 / 476 Lecture 7 Example 2 Tree of Kim likes bagels head-subj-ph verbHEAD SUBJ SPR COMPS word head-comps-ph verbnoun HEAD SUBJ SPR COMPS HEAD SUBJ SPR COMPS HEAD SUBJ SPR COMPS word word verb noun Kim likes bagels 2 2 1 1 1 HEAD SUBJ SPR COMPS IA161 Syntactic Formalisms for Parsing Natural Languages 298 / 476 Lecture 7 Compare HPSG to CFG Each sign or HPSG rule consists of SYNSEM, DTRS, and PHON parts. The SYNSEM part specifies how the syntax and semantics of the phrase (or word) are constrained. It corresponds roughly to the left-hand side of CFG rules but contains much more information. The DTRS part specifies the constituents that make up the phrase (if it is a phrase). (Each of these constituents is a complete sign.) This corresponds to part of the information on the right-hand side of CFG rules, but not to ordering information. The PHON part specifies the ordering of the constituents in DTRS (where this is constrained) and the pronunciation of these (if this is specifiable). This corresponds to the the ordering information on the right-hand side of CFG rules. IA161 Syntactic Formalisms for Parsing Natural Languages 299 / 476 Lecture 7 Simulation of Bottom-up parsing algorithm in HPSG Unify input lexical-signs with lexical-signs in the lexicon. Until no more such unifications are possible Unify instantiated signs with the daughters of instantiated phrasal signs or with phrasal signs in the grammar. . ...... if all instantiated signs but one saturated one (S) are associated with daughters of other instantiated signs and the PHON value of all instantiated signs is completely specified return the complete S structure else fail. IA161 Syntactic Formalisms for Parsing Natural Languages 300 / 476 Lecture 7 Example 2: processing of unification Kim walks The words in the sentence specify only their pronunciations and their positions. 1 [PHON (( 0 1 kim))] 2 [PHON (( 1 2 walks))] . STEP 1: Unifying 1 with the lexical entry for Kim gives .. ...... 3 [PHON ((0 1 kim)) SYNSEM [CAT [HEAD noun SUBCAT ()] CONTENT [INDEX 1 [PER 3rd NUM sing]] CONTEXT [BACKGR {[RELN naming BEARER 1 NAME Kim]}]]] We now know something about the meaning of Kim (it refers to somebody named Kim) and something about its syntactic properties (it is third person singular). IA161 Syntactic Formalisms for Parsing Natural Languages 301 / 476 Lecture 7 Example 2: processing of unification 1 [PHON (( 0 1 kim))] 2 [PHON (( 1 2 walks))] . STEP 2: Unifying 2 with the lexical entry for walks gives .. ...... 4 [PHON ((1 2 walks)) SYNSEM [CAT [HEAD [VFORM fin] SUBCAT ([CAT [HEAD noun SUBCAT ()] CONTENT [INDEX 1 [PER 3rd NUM sing]]])] CONTENT [RELN walk WALKER 1]]] We know that walks refers to walking and that it requires a subject noun phrase which refers to the walker but doesn’t require any object. IA161 Syntactic Formalisms for Parsing Natural Languages 302 / 476 Lecture 7 Example 2: processing of unification . HEAD-DTR rule .. ...... [SYNSEM [CAT [HEAD 1 SUBCAT (2)] CONTENT 4] DTRS [HEAD-DTR [SYNSEM [CAT [HEAD 1 SUBCAT (2)] CONTENT 4] PHON 3] SUBJ-DTRS ()] PHON 3] . STEP 3: Unifying 4 with the HEAD-DTR of this rule gives .. ...... 5 [SYNSEM [CAT [HEAD [VFORM fin] SUBCAT 2([CAT [HEAD noun SUBCAT ()] CONTENT [INDEX 1 [PER 3rd NUM sing]]])] CONTENT 4[RELN walk WALKER 1]] DTRS [HEAD-DTR [SYNSEM [CAT [HEAD [VFORM fin] SUBCAT (2)]] CONTENT [4] PHON 3((1 2 walks))] SUBJ-DTRS ()] PHON 3((1 2 walks))] Now we have a VP with the transitive verb walks as its head (and only constituent). IA161 Syntactic Formalisms for Parsing Natural Languages 303 / 476 Lecture 7 Example 2: processing of unification . HEAD-DTR rule .. ...... 6 [SYNSEM [CAT [HEAD 1 SUBCAT ()] CONTENT 4] DTRS [HEAD-DTR [SYNSEM [CAT [HEAD 1 SUBCAT (2)] CONTENT 4] PHON 3] SUBJ-DTRS ([PHON 5 SYNSEM 2])] PHON (5 < 3)] . STEP 4: Unifying 5 with the HEAD-DTR of this rule gives .. ...... 7 [SYNSEM [CAT [HEAD 1[VFORM fin SUBCAT ()]] CONTENT 4[RELN walk WALKER ]] DTRS [HEAD-DTR [SYNSEM [CAT [HEAD 1[VFORM fin] SUBCAT 2([CAT [HEAD noun SUBCAT ()] CONTENT [INDEX [PER 3rd NUM sing]]])] CONTENT [RELN walk WALKER 4]] PHON 3((1 2 walks))] SUBJ-DTRS ([PHON 5 SYNSEM 2[CAT [HEAD noun SUBCAT ()] CONTENT [INDEX ]]])] PHON ( 5 < 3((1 2 walks)))] IA161 Syntactic Formalisms for Parsing Natural Languages 304 / 476 Lecture 7 Example 2: processing of unification . STEP 5: Unifying 3 with the SUBJ-DTR of 7 gives .. ...... 8 [SYNSEM [CAT [HEAD [VFORM fin SUBCAT ()]] CONTENT [RELN walk WALKER [PER 3rd NUM sing]]] DTRS [HEAD-DTR [SYNSEM [CAT [HEAD [VFORM fin] SUBCAT ([CAT [HEAD noun SUBCAT ()] CONTENT [INDEX [PER 3rd NUM sing]]]) CONTENT [RELN walk WALKER [PER 3rd NUM sing]]] PHON ((1 2 walks))] SUBJ-DTRS ([PHON ((0 1 kim)) SYNSEM [CAT [HEAD noun SUBCAT ()] CONTENT [INDEX [PER 3rd NUM sing]]])] PHON ((0 1 kim) (1 2 walks))] Now the subject of the sentence is pronounceable, and we’re done. IA161 Syntactic Formalisms for Parsing Natural Languages 305 / 476 Lecture 7 Phenomena covered by HPSG parsers Case assignment Word order : scrambling Long distance dependency Coordination Scope of adverbs and negation Topic drop Agreement Relative clause … IA161 Syntactic Formalisms for Parsing Natural Languages 306 / 476 Lecture 7 Example 3: unbounded dependency construction An unbounded dependency construction involves constituents with different functions involves constituents of different categories is in principle unbounded Two kind of unbounded dependency constructions (UDCs) Strong UDCs Weak UDCs IA161 Syntactic Formalisms for Parsing Natural Languages 307 / 476 Lecture 7 Strong UDCs An overt constituent occurs in a non-argument position: Topicalization: Kimi, Sandy loves_ i. Wh-questions: I wonder [whoi Sandy loves_ i]. Wh-relative clauses: This is the politician [whoi Sandy loves_ i]. It -clefts: It is Kim i [whoi Sandy loves_ i]. Pseudoclefts: [Whati Sandy loves_ i ] is Kimi. IA161 Syntactic Formalisms for Parsing Natural Languages 308 / 476 Lecture 7 Weak UDCs No overt constituent in a non-argument position: Purpose infinitive (for -to clauses): I bought iti for Sandy to eat_ i . Tough movement: Sandyi is hard to love_ i . Relative clause without overt relative pronoun: This is [the politician]i [Sandy loves_ i ]. It-clefts without overt relative pronoun: It is Kimi [Sandy loves_ i ]. IA161 Syntactic Formalisms for Parsing Natural Languages 309 / 476 Lecture 7 Using the feature SLASH To account for UDCs, we will use the feature SLASH (so-named because it comes from notation like S/NP to mean an S missing an NP) This is a non-local feature which originates with a trace, gets passed up the tree, and is finally bound by a filler IA161 Syntactic Formalisms for Parsing Natural Languages 310 / 476 Lecture 7 The bottom of a UDC: Traces word PHON SYNSEM LOCAL NONLOC INHERITED | SLASH TO-BIND | SLASH 1 1 phonologically null, but structure-shares local and slash values IA161 Syntactic Formalisms for Parsing Natural Languages 311 / 476 Lecture 7 Traces Because the local value of a trace is structure-shared with the slash value, constraints on the trace will be constraints on the filler. For example, hates specifies that its object be accusative, and this case information is local So, the trace has [synsem|local|cat|head|case acc] as part of its entry, and thus the filler will also have to be accusative *Hei/Himi, John likes_ i IA161 Syntactic Formalisms for Parsing Natural Languages 312 / 476 Lecture 7 The middle of a UDC: The Nonlocal Feature Principle (NFP) For each NON-LOCAL feature, the inherited value on the mother is the union of the inherited values on the daughter minus the to-bind value on the head daughter. In other words, the slash information (which is part of inherited) percolates “up” the tree This allows the all the local information of a trace to “move up” to the filler IA161 Syntactic Formalisms for Parsing Natural Languages 313 / 476 Lecture 7 The middle of a UDC: The Nonlocal Feature Principle (NFP) The top of a UDC: filler-head structures Example for a structure licensed by the filler-head schema NLOC | INHERITED | SLASH LOCAL NLOC INHERITED | SLASH TO-BIND | SLASH 1 1..., ,...1 F H IA161 Syntactic Formalisms for Parsing Natural Languages 314 / 476 Lecture 7 The middle of a UDC: The Nonlocal Feature Principle (NFP) The analysis of the UDC example Johni we know She likes_i S NLOC | INHERITED | SLASH NLOC INHERITED | SLASH TO-BIND | SLASH S VP NLOC INHERITED | SLASH TO-BIND | SLASH S LOC | CAT | SUBCAT NLOC INHERITED | SLASH TO-BIND | SLASH VP LOC | CAT | SUBCAT NLOC INHERITED | SLASH TO-BIND | SLASH V LOC | CAT | SUBCAT NONLOC | TO-BIND | SLASH LOC NLOC | INHER | SLASH NP NP V NP John we know she likes -i 3 1 1 2 2 3 3 3 3 3 3 NP , HF LOCAL 3 i HS CH HS H C 1 IA161 Syntactic Formalisms for Parsing Natural Languages 315 / 476 Lecture 7 Example 4 John reads a new book PHON SYNSEM | LOC CAT CONT READER READEE HEAD VAL VFORM AUX INV SUBJ COMPS SPR NP NP reads fin bool bool nom,-PRD acc,-PRD word read verb 3rd,sg 1 2 1 2 IA161 Syntactic Formalisms for Parsing Natural Languages 316 / 476 Lecture 7 Example 4 John reads a new book PHON SYNSEM | LOCAL CAT | HEAD CONT MOD LOCAL PRD CAT CONT HEAD VAL | SPR INDEX RESTR INDEX RESTR RELN ARG new noun new cat nom-obj localsynsem adj nom-obj local word - 1 1 1 2 2 IA161 Syntactic Formalisms for Parsing Natural Languages 317 / 476 Lecture 7 Example 4 John reads a new book Note: apply head-adjunct schema PHON SS | LOC CAT CONT HEAD VAL | SPR PHON SS | LOC CAT | HEAD | MOD CONT INDEX RESTR RELN ARG PHON SS LOC CAT CONT HEAD VAL | SPR INDEX RESTR PER NUM GEN RELN INST book neut sg 3rd book new new book nom-obj nom-obj new 1 1 2 2 3 4 4 4 4 3 5 5 A H IA161 Syntactic Formalisms for Parsing Natural Languages 318 / 476 Lecture 7 Example 4 John reads a new book PHON SS | LOC CAT CONT HEAD VAL | SPR PHON SS LOC | CAT | HEAD SPEC PHON SS LOC CAT CONT HEAD VAL | SPR a new book a new book det 1 1 2 2 6 67 7 SPR H IA161 Syntactic Formalisms for Parsing Natural Languages 319 / 476 Lecture 7 Example 4 John reads a new book PHON SS | LOC CAT CONT HEAD VAL SUBJ COMPS NP nom,-PRD PHON SS | LOC CAT CONT HEAD VAL VFORM SUBJ COMPS NP acc,-PRD READER READEE PHON SS LOC CAT CONT | INDEX HEAD VAL CASE PRD SUBJ COMPS SPR reads a new book reads a new book 3rd,sg fin acc noun read verb 7 7 4 4 4 8 8 9 9 10 10 11 11 H C IA161 Syntactic Formalisms for Parsing Natural Languages 320 / 476 Lecture 7 Example 4 John reads a new book - completed analysis PHON SS | LOC CAT CONT HEAD VAL SUBJ COMPS SPR PHON SS LOC CAT CONT HEAD VAL INDEX RESTR CASE PRD - SUBJ COMPS SPR PER NUM GEND NAME INST PHON SS | LOC CAT CONT HEAD VAL SUBJ COMPS READER READEE John reads a new book John reads a new book 4 7 7 8 8 9 9 11 11 11 nom 3rd sg masc John naming nom-obj read noun SUBJ H IA161 Syntactic Formalisms for Parsing Natural Languages 321 / 476 Lecture 7 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 322 / 476 Lecture 7 Outline Applicative system Combinators Combinators vs. λ-expressions Application to natural language parsing Combinators used in CCG IA161 Syntactic Formalisms for Parsing Natural Languages 323 / 476 Lecture 7 Applicative system CL (Curry & Feys, 1958, 1972) as an applicative system CL is an applicative system because the basic unique operation in CL is the application of an operator to an operand Operator(Operand) Operator Operand IA161 Syntactic Formalisms for Parsing Natural Languages 324 / 476 Lecture 7 Combinators CL defines general operators, called Combinators. Each combinator composes between them the elementary combinators and defines the complexe combinators. Certains combinators are considered as the basic combinators to define the other combinators. IA161 Syntactic Formalisms for Parsing Natural Languages 325 / 476 Lecture 7 Elementary combinators I =def λx.x (identificator) K =def λx.λy.x (cancellator) W =def λx.λy.xyy (duplicator) C =def λx.λy.λz.xzy (permutator) B =def λx.λy.λz.x(yz) (compositor) S =def λx.λy.λz.xz(yz) (substitution) Φ =def λx.λy.λz.λu.x(yu)(zu) (distribution) Ψ =def λx.λy.λz.λu.x(yz)(yu) (distribution) IA161 Syntactic Formalisms for Parsing Natural Languages 326 / 476 Lecture 7 β-reductions The combinators are associated with the β-reductions in a canonical form: β-reduction relation between X and Y X ≥β Y Y was obtained from X by a β-reduction IA161 Syntactic Formalisms for Parsing Natural Languages 327 / 476 Lecture 7 β-reductions Ix ≥β x Kxy ≥β x Wxy ≥β xyy Cxyz ≥β xzy Bxyz ≥β x(yz) Sxyz ≥β xz(yz) Φxyzu ≥β x(yu)(zu) Ψxyzu ≥β x(yz)(yu) . ...... Each combinator is an operator which has a certain number of arguments (operands); sequences of the arguments which follow the comnator are called “the scope of combinator”. IA161 Syntactic Formalisms for Parsing Natural Languages 328 / 476 Lecture 7 β-reductions Intuitive interpretations of the elementary combinators are given by the associated β-reductions. The combinator I expresses the identity. The combinator K expresses the constant function. The combinator W expresses the diagonalisation or the duplication of an argument. The combinator C expresses the conversion, that is, the permutation of two arguments of an binary operator. The combinator B expresses the functional composition of two operators. The combinator S expresses the functional composition and the duplication of argument. The combinator Φ expresses the composition in parallel of operators acting on the common data. The combinator Ψ expresses the composition by distribution. IA161 Syntactic Formalisms for Parsing Natural Languages 329 / 476 Lecture 7 Introduction and elimination rules of combinators Introduction and elimination rules of combinators can be presented in the style of Gentzen (natural deduction). Elim. Rules Intro. Rules If f - - - [e-I] - - - [i-I] f If Kfx f - - - - - [e-K] - - - - [i-K] f Kfx IA161 Syntactic Formalisms for Parsing Natural Languages 330 / 476 Lecture 7 Introduction and elimination rules of combinators Elim. Rules Intro. Rules Cfx xf - - - [e-C] - - - [i-C] xf Cfx Bfxy f(xy) - - - - - [e-B] - - - - [i-B] f(xy) Bfxy Φfxyz f(xz)(yz) - - - - - [e-Φ] - - - - [i-Φ] f(xz)(yz) Φfxyz IA161 Syntactic Formalisms for Parsing Natural Languages 331 / 476 Lecture 7 Combinators vs. λ -expressions The most important difference between the CL and λ-calculus is the use of the bounded variables. Every combinator is an λ -expression. Bfg ≡ λx.f(gx) Tx ≡ λf.fx Sfg ≡ λx.fx(gx) IA161 Syntactic Formalisms for Parsing Natural Languages 332 / 476 Lecture 7 Application to natural language parsing John is brilliant The predicate is brilliant is an operator which operate on the operand John to construct the final proposition. The applicative representation associated to this analysis is the following: (is-brillant)John We define the operator John* as being constructed from the lexicon John by [John* = C* John]. 1 John* (is-brillant) 2 [John* = C* John] 3 C*John (is-brillant) 4 is-brillant (John) IA161 Syntactic Formalisms for Parsing Natural Languages 333 / 476 Lecture 7 Application to natural language parsing John is brilliant in λ-term Operator John* by λ-expression [John* = λx.x (John’)] 1 John*(λx.is-brilliant’(x)) 2 [John* = λx.x (John’)] 3 (λx.x(John’))(λx.is-brilliant’(x)) 4 (λx.is-brilliant’(x))(John’) 5 is-brillinat’(John’) IA161 Syntactic Formalisms for Parsing Natural Languages 334 / 476 Lecture 7 Passivisation Consider the following sentences a. The man has been killed. b. One has killed him. → Invariant of meaning → Relation between two sentences :a. unary passive predicate (has-been-killed) :b. active transitive predicate (have-killed) IA161 Syntactic Formalisms for Parsing Natural Languages 335 / 476 Lecture 7 Definition of the operator of passivisation ’PASS’ [PASS = B ∑ C = ∑ ◦ C] where B and C are the combinator of composition and of conversion and where ∑ is the existential quantificator which, by applying to a binary predicate, transforms it into the unary predicate. IA161 Syntactic Formalisms for Parsing Natural Languages 336 / 476 Lecture 7 Definition of the operator of passivisation ’PASS’ . ...... [PASS = B ∑ C = ∑ ◦ C] 1/ has-been-killed (the-man) hypothesis 2/ [has-been-killed=PASS(has killed)] passive lexical predicate 3/ PASS (has-killed)(the-man) repl.2.,1. 4/ [PASS = B ∑ C ] definition of ’PASS’ 5/ B ∑ C (has-killed)(the-man) repl.4.,3. 6/ ∑ (C(has-killed))(the-man) [e-B] 7/ (C(has-killed)) x (the-man) [e- ∑ ] 8/ (has-killed)(the-main) x [e-C] 9/ [x in the agentive subject position = one] definition of ’one’ 10/ (has-killed)(the-man)one repl.9.,8., normal form IA161 Syntactic Formalisms for Parsing Natural Languages 337 / 476 Lecture 7 Definition of the operator of passivisation ’PASS’ We establish the paraphrastic relation between the passive sentence with expressed agent and its active counterpart: The man has been killed by the enemy ↓ The enemy has killed the man IA161 Syntactic Formalisms for Parsing Natural Languages 338 / 476 Lecture 7 Definition of the operator of passivisation ’PASS’ . ......Relation between give-to and receive-from z gives y to x ↕ x receives y from x . ...... The lexical predicate “give-to” has a predicate converse associated to “receive-from”; [receive-from z y x = give-to x y z] IA161 Syntactic Formalisms for Parsing Natural Languages 339 / 476 Lecture 7 Definition of the operator of passivisation ’PASS’ 1/ (receive-from) z y x 2/ C((receive-from) z) x y 3/ BC(receive-from) z x y 4/ C(BC(receive-from)) z x y 5/ C(C(BC(receive-from)) x) y z 6/ BC(C(BC(receive-from))) x y z 7/ [give-to=BC(C(BC(receive-from)))] 8/ give-to x y z IA161 Syntactic Formalisms for Parsing Natural Languages 340 / 476 Lecture 7 Combinators used in CCG Motivation of applying the combinators to natural language parsing Linguistic: complex phenomena of natural language applicable to the various languages Informatics: left to right parsing (LR) ex: reduce the spurious-ambiguity IA161 Syntactic Formalisms for Parsing Natural Languages 341 / 476 Lecture 7 Parsing a sentence in CCG Step 1: tokenization Step 2: tagging the concatenated lexicon Step 3: calculate on types attributed to the concatenated lexicons by applying the adequate combinatorial rules Step 4: eliminate the applied combinators (we will see how to do on next week) Step 5: finding the parsing results presented in the form of an operator/operand structure (predicate -argument structure) IA161 Syntactic Formalisms for Parsing Natural Languages 342 / 476 Lecture 7 Parsing a sentence in CCG Example: I requested and would prefer musicals STEP 1 : tokenization/lemmatization → ex) POS Tagger, tokenizer, lemmatizer a. I-requested-and-would-prefer-musicals b. I-request-ed-and-would-prefer-musical-s STEP 2 : tagging the concatenated expressions → ex) Supertagger, Inventory of typed words I NP Requested (S\NP)/NP And CONJ Would (S\NP)/VP Prefer VP/NP musicals NP IA161 Syntactic Formalisms for Parsing Natural Languages 343 / 476 Lecture 7 Parsing a sentence in CCG STEP 3 : categorial calculus c. apply the coordination rules Coordination: (< & >) X conj X ⇒ X b. apply the functional composition rules Forward Composition: (> B) X/Y : f Y/Z : g ⇒ X/Z : Bfg a. apply the type-raising rules Subject Type-raising (> T) NP : a ⇒ T/(T\NP) : Ta 7/ S 6/ S/NP NP (>) 5/ S/(S\NP) (S\NP)/NP NP (>B) 4/ S/(S\NP) (S\NP)/NP NP (> Φ) 3/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/NP NP (>B) 2/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/VP VP/NP NP (>T) 1/ NP (S\NP)/NP CONJ (S\NP)/VP VP/NP NP I- requested- and- would- prefer- musicals IA161 Syntactic Formalisms for Parsing Natural Languages 344 / 476 Lecture 7 Parsing a sentence in CCG STEP 4 : semantic representation (predicate-argument structure) 7/S: and’(will’(prefer’ musicals’) i’)(request’ musicals’ i’) 6/ :λy.and’(would’(prefer’ musicals’)y)(request’ musicals’ y) 5/ : λxλy.and’(will’(prefer’x)y)(request’xy) 4/ : λxλy.and’(will’(prefer’x)y)(request’xy) 3/ : λx.λy.will’(prefer’x)y 2/ :λf.f I’ 1/ :i’ :request’ :and’ : will’ :prefer’ : musicals’ I requested and would prefer musicals IA161 Syntactic Formalisms for Parsing Natural Languages 345 / 476 Lecture 7 Semantic representation in term of the combinators I- requested and- would- prefer musicals 1/ NP (S\NP)/NP CONJ (S\NP)/VP VP/NP NP 2/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/VP VP/NP NP (>T) C*I requested and would prefer musicals 3/ S/(S\NP) (S\NP)/NP CONJ (S\NP)/NP NP (>B) C*I requested and B would prefer musicals 4/ S/(S\NP) (S\NP)/NP NP (> Φ) C*I Φ and requested (B would prefer) musicals 5/ S/NP NP (>B) B((C*I)(Φ and requested (B would prefer))) musicals 6/ S (>) B((C*I)(Φ and requested (B would prefer))) musicals IA161 Syntactic Formalisms for Parsing Natural Languages 346 / 476 Lecture 7 Semantic representation in term of the combinators . ...... I requested and would prefer musicals S: B((C*I)(Φ and requested (B would prefer))) musicals 1/ B((C*I)(Φ and requested (B would prefer))) musicals 2/ (C*I)((Φ and requested (B would prefer))) musicals) [e-B] 3/ ((Φ and requested (B would prefer))) musicals) I [e-C*] 4/ (and (requested musicals) ((B would prefer) musicals)) I [e-Φ] 5/ ((and (requested musicals) (would (prefer musicals))) I ) [e-B] IA161 Syntactic Formalisms for Parsing Natural Languages 347 / 476 Lecture 7 Normal form A normal form is a combinatory expression which is irreducible in the sense that it contain any occurrence of a redex. If a combinatory expression X reduce to a combinatory expression N which is in normal form, so N is called the normal form of X. . Example .. ...... Bxyz is reducible to x(yz). x(yz) is a normal form of the combinatory expression Bxyz. IA161 Syntactic Formalisms for Parsing Natural Languages 348 / 476 Lecture 7 Normal form . Example .. ...... Prove xyz is the normal form of BBCxyz. BBCxyz →β xyz 1/ BBCxyz 2/ C(Cx)yz [e-B] 3/ Cxzy [e-C] 4/ xyz [e-C] IA161 Syntactic Formalisms for Parsing Natural Languages 349 / 476 Lecture 7 Classwork Give the semantic representation in term of combinators. Please refer to the given paper on last lecture on CCG Parsing. IA161 Syntactic Formalisms for Parsing Natural Languages 350 / 476 Lecture 9 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 351 / 476 Lecture 9 Outline HPSG Parser : Enju Parsing method Description of parser Result CCG Parser : C&C Tools Parsing method Description of parser Result IA161 Syntactic Formalisms for Parsing Natural Languages 352 / 476 Lecture 9 Theoretical backgrounds Lecture 3 about HPSG Parsing Lecture 6 & 7 about CCG Parsing and Combinatory Logic IA161 Syntactic Formalisms for Parsing Natural Languages 353 / 476 Lecture 9 Enju (Y. Miyao, J.Tsujii, 2004, 2008) Syntactic parser for English Developed by Tsujii Lab. Of the University of Tokyo Based on the wide-coverage probabilistic HPSG HPSG theory [Pollard and Sag, 1994] Useful links to Enju http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html http://www-tsujii.is.s.u-tokyo.ac.jp/enju/ IA161 Syntactic Formalisms for Parsing Natural Languages 354 / 476 Lecture 9 Motivations Parsing based on a proper linguistic formalism is one of the core research fields in CL and NLP. But! a monolithic, esoteric and inward looking field, largely dissociated from real world application. IA161 Syntactic Formalisms for Parsing Natural Languages 355 / 476 Lecture 9 Motivations So why not! The integration of linguistic grammar formalisms with statistical models to propose an robust, efficient and open to eclectic sources of information other than syntactic ones IA161 Syntactic Formalisms for Parsing Natural Languages 356 / 476 Lecture 9 Motivations Two main ideas Development of wide-coverage linguistic grammars Deep parser which produces semantic representation (predicate-argument structures) IA161 Syntactic Formalisms for Parsing Natural Languages 357 / 476 Lecture 9 Parsing method Application of probabilistic model in the HPSG grammar and development of an efficient parsing algorithm Accurate deep analysis Disambiguation Wide-coverage High speed Useful for high level NLP application IA161 Syntactic Formalisms for Parsing Natural Languages 358 / 476 Lecture 9 Parsing method 1 Parsing based on HPSG Mathematically well-defined with sophisticated constraint-based system Linguistically justified Deep syntactic grammar that provides semantic analysis IA161 Syntactic Formalisms for Parsing Natural Languages 359 / 476 Lecture 9 Parsing method Difficulties in parsing based on HPSG Difficult to develop a broad-coverage HPSG grammar Difficult to disambiguate Low efficiency: very slow IA161 Syntactic Formalisms for Parsing Natural Languages 360 / 476 Lecture 9 Parsing method Solution: Corpus-oriented development of an HPSG grammar The principal aim of grammar development is treebank construction Penn treebank is coverted into an HPSG treebank A lexicon and a probabilistic model are extracted from the HPSG treebank IA161 Syntactic Formalisms for Parsing Natural Languages 361 / 476 Lecture 9 Parsing method Approach: develop grammar rules and an HPSG treebank collect lexical entries from the HPSG treebank . ...... How to make an HPSG treebank? Convert Penn Treebank into HPSG and develop grammar by restructuring a treebank in conformity with HPSG grammar rules IA161 Syntactic Formalisms for Parsing Natural Languages 362 / 476 Lecture 9 Parsing method HPSG = lexical entries and grammar rules Enju grammar has 12 grammar rules and 3797 lexical entries for 10,536 words (Miyao et al. 2004) IA161 Syntactic Formalisms for Parsing Natural Languages 363 / 476 Lecture 9 Parsing method Overview of grammar development 1. Treebank conversion 2. Grammar rule application 3. Lexical entry collection Modify constituent structures by adding feature structures Apply the grammar rule when a parse tree contains correct analysis and specified feature values are filled Collect terminal nodes of HPSG parse trees and assign predicate-argument structure IA161 Syntactic Formalisms for Parsing Natural Languages 364 / 476 Lecture 9 Parsing method 2 Probabilistic model and HPSG: Log-linear model for unification-based grammars (Abney 1997, Johnson et al. 1999, Riezler et al. 2000, Miyao et al. 2003, Malouf and van Noord 2004, Kaplan et al. 2004, Miyao and Tsujii 2005) p(T|w) w = “A blue eyes girl with white hair and skin walked T = A blue eyes girl with white hair and skin walked NP NP NP NP S NP NP PP VP IA161 Syntactic Formalisms for Parsing Natural Languages 365 / 476 Lecture 9 Parsing method T1 T2 T3 T4 Tn All possible parse trees derived from w with a grammar. For example, p(T3|w) is the probability of selecting T3 from T1, T2, …, and Tn. IA161 Syntactic Formalisms for Parsing Natural Languages 366 / 476 Lecture 9 Parsing method Log-linear model for unification-based grammars Input sentence: w w = w1/P1, w2/P2, . . . wn/Pn Output parse tree T Normalization factor Weight for a feature function Feature function IA161 Syntactic Formalisms for Parsing Natural Languages 367 / 476 Lecture 9 Description of parser IA161 Syntactic Formalisms for Parsing Natural Languages 368 / 476 Lecture 9 Description of parser parsing proceeds in the following steps: 1. preprocessing Preprocessor converts an input sentence into a word lattice. 2. lexicon lookup Parser uses the predicate to find lexical entries for the word lattice 3. kernel parsing Parser does phrase analysis using the defined grammar rules in the kernel parsing process. IA161 Syntactic Formalisms for Parsing Natural Languages 369 / 476 Lecture 9 Description of parser Chart data structure two dimensional table we call each cell in the table ‘CKY cell.’ Example Let an input sentence s(= w1, w2, w3, ..., wn), w1 = ”I”, w2 = ”saw”, w3 = ”a”, w4 = ”girl”, w5 = ”with”, w6 = ”a”, w7 = ”telescope” for the sentence “I saw a girl with a telescope”, the chart is arranged as follows. I saw a girl with a telescope 0,1 1,2 2,3 3,4 4,5 5,6 6,7 0,2 1,3 2,4 3,5 4,6 5,7 0,3 1,4 2,5 3,6 4,7 0,4 1,5 2,6 3,7 0,5 1,6 2,7 0,6 1,7 0,7 IA161 Syntactic Formalisms for Parsing Natural Languages 370 / 476 Lecture 9 Description of parser System overview Supertagger Enumeration of assignments Deterministic disambiguation Mary loved John HEAD noun Subj < > COMPS < > HEAD noun Subj < > COMPS < > HEAD noun Subj < > COMPS < > HEAD verb Subj COMPS HEAD noun Subj < > COMPS < > HEAD noun Subj < > COMPS < > HEAD noun Subj < > COMPS < > HEAD verb Subj COMPS HEAD verb Subj COMPS HEAD noun Subj < > COMPS < > HEAD verb Subj COMPS HEAD noun Subj < > COMPS < > Mary loved John Mary loved John IA161 Syntactic Formalisms for Parsing Natural Languages 371 / 476 Lecture 9 Demonstration http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html IA161 Syntactic Formalisms for Parsing Natural Languages 372 / 476 Lecture 9 Results Fast, robust and accurate analysis Phrase structures Predicate argument structures Accurate deep analysis – the parser can output both phrase structures and predicate-argument structures. The accuracy of predicate-argument relations is around 90% for newswire articles and biomedical papers. High speed – parsing speed is less than 500 msec. per sentence by default (faster than most Penn Treebank parsers), and less than 50 msec when using the highspeed setting (”mogura”). IA161 Syntactic Formalisms for Parsing Natural Languages 373 / 476 Lecture 9 C&C tools Developed by Curran and Clark [Clark and Curran, 2002, Curran, Clark and Bos, 2007], University of Edinburgh Wide-coverage statistical parser based on the CCG: CCG Parser Computational semantic tools named Boxer Useful links http://svn.ask.it.usyd.edu.au/trac/candc http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo IA161 Syntactic Formalisms for Parsing Natural Languages 374 / 476 Lecture 9 CCG Parser [Clark, 2007] Statistical parsing and CCG Advantages of CCG providing a compositional semantic for the grammar →completely transparent interface between syntax and semantics the recovery of long-range dependencies can be integrated into the parsing process in a straightforward manner IA161 Syntactic Formalisms for Parsing Natural Languages 375 / 476 Lecture 9 Parsing method Penn Treebank conversion : TAG, LFG, HPSG and CCG CCGBank [Hockenmaier and Steedman, 2007] CCG version of the Penn Treebank Grammar used in CCG parser CCGBank Some rules used as the grammar Lexical category set Training data for the statistical models Supertagger Parser IA161 Syntactic Formalisms for Parsing Natural Languages 376 / 476 Lecture 9 Parsing method-CCG Bank Corpus translated from the Penn Treebank, CCGBank contains Syntactic derivations Word-word dependencies Predicate-argument structures IA161 Syntactic Formalisms for Parsing Natural Languages 377 / 476 Lecture 9 Parsing method-CCG Bank Semi automatic conversion of phrase-structure trees in the Penn Treebank into CCG derivations Consists mainly of newspaper texts Grammar: Lexical category set Combinatory rules Unary type-changing rules Normal-form constraints Punctuation rules IA161 Syntactic Formalisms for Parsing Natural Languages 378 / 476 Lecture 9 Parsing method Supertagging [Clark, 2002] uses conditional maximum entropy models implement a maximum entropy supertagger ADV NOM PRP PRO:DEM NOM KON VER:pres VER:infi DET:ART tout commentaire sur cette proposition et prefere avancer les (s\1 s)/(s np/n (s\1 s)/n np/np s\1 s n np (np\np)/n (s\1 s)/np (n\n)/np pp_sur/np np/n n ((np\s)\( ((np\s)/n (np\s)/np (s/np)/(n np\s (np\s)/(n ((np\s_inf) (np\s_inf) np/n IA161 Syntactic Formalisms for Parsing Natural Languages 379 / 476 Lecture 9 Parsing method-Supertagger Set of 425 lexical categories from the CCGbank The per-word accuracy of the Supertagger is around 92% on unseen WSJ text. → Using the multi-supertagger increases the accuracy significantly – to over 98% – with only a small cost in increased ambiguity. IA161 Syntactic Formalisms for Parsing Natural Languages 380 / 476 Lecture 9 Parsing method-Supertagger Log-linear models in NLP applications: POS tagging Name entity recognition Chunking Parsing → referred as maximum entropy models and random fields IA161 Syntactic Formalisms for Parsing Natural Languages 381 / 476 Lecture 9 Parsing method-Supertagger Log-linear parsing models for CCG 1 the probability of a dependency structure 2 the normal-form model: the probability of a single derivation → modeling 2) is simpler than 1) 1 defined as P(π|S) = ∑ d∈∆(π) P(d, π|S) 2 defined using a log-linear form as follows: P(w|S) = 1 ZS eλ.f(w) ZS = ∑ w∈p(S) eλ.f(w′) IA161 Syntactic Formalisms for Parsing Natural Languages 382 / 476 Lecture 9 Parsing method-Supertagger Features common to the dependency and normal-form models Feature type Example LexCat + word (S/S)/NP + Before LexCat + POS (S/S)/NP + IN RootCat S[dcl] RootCat + World S[dcl] + was RootCat + POS S[dcl] + VBD Rule S[dcl] → NP S[dcl]\NP Rule + Word S[dcl] → NP S[dcl]\NP + bought Rule + POS S[dcl] → NP S[dcl]\NP + VBD IA161 Syntactic Formalisms for Parsing Natural Languages 383 / 476 Lecture 9 Parsing method-Supertagger Predicate-argument dependency features for the dependency model Feature type Example Word-Word ⟨bought, (S\NP1)/NP2, 2, stake, (NP\NP)/(S[dcl]/NP)⟩ Word-POS ⟨bought, (S\NP1)/NP2, 2, NN, (NP\NP)/(S[dcl]/NP)⟩ POS-Word ⟨VBD, (S\NP1)/NP2, 2, stake, (NP\NP)/(S[dcl]/NP)⟩ POS-POS ⟨VBD, (S\NP1)/NP2, 2, NN, (NP\NP)/(S[dcl]/NP)⟩ Word + Distance(words) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 2 Word + Distance(punct) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0 Word + Distance(verbs) ⟨bought, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0 POS + Distance(words) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 2 POS + Distance(punct) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0 POS + Distance(verbs) ⟨VBD, (S\NP1)/NP2, 2, (NP\NP)/(S[dcl]/NP)⟩ + 0 IA161 Syntactic Formalisms for Parsing Natural Languages 384 / 476 Lecture 9 Parsing method-Supertagger Rule dependency features for the normal-form model Feature type Example Word-Word ⟨company, S[dcl] → NP S[dcl]\NP, bought⟩ Word-POS ⟨company, S[dcl] → NP S[dcl]\NP, VBD⟩ POS-Word ⟨NN, S[dcl] → NP S[dcl]\NP, bought⟩ POS-POS ⟨NN, S[dcl] → NP S[dcl]\NP, VBD⟩ Word + Distance(words) ⟨bought, S[dcl] → NP S[dcl]\NP⟩+ > 2 Word + Distance(punct) ⟨bought, S[dcl] → NP S[dcl]\NP⟩ + 2 Word + Distance(verbs) ⟨bought, S[dcl] → NP S[dcl]\NP⟩ + 0 POS + Distance(words) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩+ > 2 POS + Distance(punct) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩ + 2 POS + Distance(verbs) ⟨VBD, S[dcl] → NP S[dcl]\NP⟩ + 0 IA161 Syntactic Formalisms for Parsing Natural Languages 385 / 476 Lecture 9 Description of parser Input sentence CCGBank C&C taggers Supertaggers POStagger Chunker Parser Boxer IA161 Syntactic Formalisms for Parsing Natural Languages 386 / 476 Lecture 9 Demonstration http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo IA161 Syntactic Formalisms for Parsing Natural Languages 387 / 476 Lecture 9 Results Supertagger ambiguity and accuracy on section00 β k CATS/WORD ACC SENT ACC ACC(POS) SENT ACC 0.075 20 1.27 97.34 67.43 96.34 60.27 0.030 20 1.43 97.92 72.87 97.05 65.50 0.010 20 1.72 98.37 77.73 97.63 70.52 0.005 20 1.98 98.52 79.25 97.86 72.24 0.001 150 3.57 99.17 87.19 98.66 80.24 IA161 Syntactic Formalisms for Parsing Natural Languages 388 / 476 Lecture 9 Results Parsing accuracy on DepBank Relation dependent aux conj ta det arg_mod mod ncmod xmod cmod pmod arg CCG parser CCGbank Prec Rec F Prec Rec F # GRs 84.07 82.19 83.12 88.83 84.19 86.44 10,696 95.03 90.75 92.84 96.47 90.33 93.30 400 79.02 75.97 77.46 83.07 80.27 81.65 595 51.52 11.64 18.99 62.07 12.59 20.93 292 95.23 94.97 95.10 97.27 94.09 95.66 1,114 81.46 81.76 81.61 86.75 84.19 85.45 8,295 71.30 77.23 74.14 77.83 79.65 78.73 3,908 73.36 78.96 76.05 78.88 80.64 79.75 3,550 42.67 53.93 47.64 56.54 60.67 58.54 178 51.34 57.14 54.08 64.77 69.09 66.86 168 0.00 0.00 0.00 0.00 0.00 0.00 12 85.76 80.01 82.78 89.79 82.91 86.21 4,387 DepBank: Parc Dependency Bank [King et al. 2003] IA161 Syntactic Formalisms for Parsing Natural Languages 389 / 476 Lecture 9 Results subj_or_dobj 86.08 83.08 84.56 91.01 85.29 88.06 3,127 subj 84.08 75.57 79.60 89.07 78.43 83.41 1,363 nesubj 83.89 75.78 79.63 88.86 78.51 83.37 1,354 xsubj 0.00 0.00 0.00 50.00 28.57 36.36 7 csubj 0.00 0.00 0.00 0.00 0.00 0.00 2 comp 86.16 81.71 83.88 89.92 84.74 87.25 3,024 obj 86.30 83.08 84.66 90.42 85.52 87.90 2,328 dobj 87.01 88.44 87.71 92.11 90.32 91.21 1,764 obj2 68.42 65.00 66.67 66.67 60.00 63.16 20 iobj 83.22 65.63 73.38 83.59 69.81 76.08 544 clausal 77.67 72.47 74.98 80.35 77.54 78.92 672 xcomp 77.69 74.02 75.81 80.00 78.49 79.24 381 ccomp 77.27 70.10 73.51 80.81 76.31 78.49 291 pcomp 0.00 0.00 0.00 0.00 0.00 0.00 24 macroaverage 65.71 62.29 63.95 71.73 65.85 68.67 microaverage 81.95 80.35 81.14 86.86 82.75 84.76 IA161 Syntactic Formalisms for Parsing Natural Languages 390 / 476 Lecture 10 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 391 / 476 Lecture 10 Study materials Course materials and homeworks are available on the following web site https://is.muni.cz/course/fi/autumn2011/IA161 IA161 Syntactic Formalisms for Parsing Natural Languages 392 / 476 Lecture 10 Outline Introduction to Statistical parsing methods Statistical Parsers RASP system Stanford parser Collins parser Charniak parser Berkeley parser IA161 Syntactic Formalisms for Parsing Natural Languages 393 / 476 Lecture 10 1. Introduction to statistical parsing The main theoretical approaches behind modern statistical parsers Over the last 12 years statistical parsing has succeeded significantly! NLP researchers have produced a range of statistical parsers → wide-coverage and robust parsing accuracy They continues to improve the parsers year on year. IA161 Syntactic Formalisms for Parsing Natural Languages 394 / 476 Lecture 10 Application domains of statistical parsing Question answering systems of high precision Named entity extraction Syntactically based sentence compressions Extraction of people’s opinion about products Improved interaction in computer ganes Helping linguists find data IA161 Syntactic Formalisms for Parsing Natural Languages 395 / 476 Lecture 10 NLP parsing problem and solution The structure of language is ambiguous! → local and global ambiguities Classical parsing problem → simple 10 grammar rules can generate 592 parsers → real size wide-coverage grammar generates millions of parses IA161 Syntactic Formalisms for Parsing Natural Languages 396 / 476 Lecture 10 NLP parsing problem and solution NLP parsing solution We need mechanisms that allow us to find the most likely parses → statistical parsing lets us work with very loose grammars that admit millions of parses for sentences but to still quickly find the best parses IA161 Syntactic Formalisms for Parsing Natural Languages 397 / 476 Lecture 10 Improved methodology for robust parsing The annotated data: Penn Treebank (early 90’s) Building a treebank seems a lot slower and less useful than building a grammar But it has many helpful things Reusability of the labor Broad coverage Frequencies and distributional information A way to evaluate systems IA161 Syntactic Formalisms for Parsing Natural Languages 398 / 476 Lecture 10 Characterization of Statistical parsing What the grammar which determines the set of legal syntactic structures for a sentence? How is that grammar obtained? What is the algorithm for determining the set of legal parses for a sentence? What is the model for determining the probability of different parses for a sentence? What is the algorithm, given the model and a set of possible parses which finds the best parse? IA161 Syntactic Formalisms for Parsing Natural Languages 399 / 476 Lecture 10 Characterization of Statistical parsing Tbest = arg max Score(T, S) Two components: The model: a function Score which assigns scores (probabilities) to tree and sentence pairs The parser: the algorithm which implements the search for Tbest IA161 Syntactic Formalisms for Parsing Natural Languages 400 / 476 Lecture 10 Characterization of Statistical parsing Statistical parsing seen as more of a pattern recognition/Machine Learning problem plus search The grammar is only implicitly defined by the training data and the method used by the parser for generating hypotheses IA161 Syntactic Formalisms for Parsing Natural Languages 401 / 476 Lecture 10 Statistical parsing models Probabilistic approach would suggest the following for the Score function Score(T, S) = P(T|S) Lots of research on different probability models for Penn Treebank trees Generative models, log-linear (maximum entropy) models, … IA161 Syntactic Formalisms for Parsing Natural Languages 402 / 476 Lecture 10 2. Statistical parsers Many kinds of parsers based on the statistical methods:probability, machine learning Different objectives: research, commercial, pedagogical RASP, Stanford parser, Berkeley parser, IA161 Syntactic Formalisms for Parsing Natural Languages 403 / 476 Lecture 10 RASP system Robust Accurate Statistical Parsing (2nd release): [Briscoe&Carroll, 2002; Briscoe et al. 2006] system for syntactic annotation of free text Semantically-motivated output representation Enhanced grammar and part-of-speech tagger lexicon Flexible and semi-supervised training method for structural parse ranking model Useful links to RASP http://ilexir.co.uk/applications/rasp/download/ http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/ IA161 Syntactic Formalisms for Parsing Natural Languages 404 / 476 Lecture 10 Components of system Tokeniser PoS Tagger Lemmatiser Parser/Grammar Parse Ranking Model raw text Input: unannotated text or transcribed (and punctuated) speech 1st step: sentence boundary detection and tokenisation modules 2nd step: Tokenized text is tagged with one of 150 POS and punctuation labels (derived from the CLAWS tagset) → first-order (’bigram’) HMM tagger → trained on the manually corrected tagged version of the Susanne, LOB and BNC corpora IA161 Syntactic Formalisms for Parsing Natural Languages 405 / 476 Lecture 10 Components of system Tokeniser PoS Tagger Lemmatiser Parser/Grammar Parse Ranking Model raw text 3rd step: Morphological analyzer 4th step: Manually developed wide-coverage tag sequence grammar in the parser → 689 unification based phrase structure rules → preterminals to this grammar are the POS and punctuation tags → terminals are featural description of the preterminals → non-terminals project information up the tree using an X-bar scheme with 41 attributes with a maximum of 33 atomic values IA161 Syntactic Formalisms for Parsing Natural Languages 406 / 476 Lecture 10 Components of system Tokeniser PoS Tagger Lemmatiser Parser/Grammar Parse Ranking Model raw text 5th step: Generalized LR Parser → a non-deterministic LALR table is constructed automatically from CF ’backbone’ compiled from the featurebased grammar → the parser builds a packed parse forest using this table to guide the actions it performs → the n-best parses can be efficiently extracted by unpacking sub-analyses, following pointers to contained subanalyses and choosing alternatives in order of probabilistic ranking IA161 Syntactic Formalisms for Parsing Natural Languages 407 / 476 Lecture 10 Components of system dependent ta arg_mod det aux conj mod arg subj_or_dobj subj comp ncmod xmod cmod pmod ncsubj xsubj csubj obj pcomp clausal dobj obj2 iobj xcomp ccomp Output: set of named grammatical relations (GRs) → resulting set of ranked parses can be displayed or passed on for further processing → transformation of derivation trees into a set of named GRs → GR scheme captures those aspects of predicate-argument struc- ture IA161 Syntactic Formalisms for Parsing Natural Languages 408 / 476 Lecture 10 Evaluation The system has been evaluated using the re-annotation of the PARC dependency bank (DepBank, King et al., 2003) It consists of 560 sentences chosen randomly from section 23 of the WSJ with grammatical relations compatible with RASP system. Form of relations (relation subtype head dependent initial) Type of relationship between the head and the dependent Encoding additional specifications of the relation type for some relations and the initial or underlying logical relation of the grammatical subject in constructions such as passive IA161 Syntactic Formalisms for Parsing Natural Languages 409 / 476 Lecture 10 Evaluation Relation Precision Recall F1 std GRs dependent 79.76 77.49 78.61 10696 aux 93.33 91.00 92.15 400 conj 72.39 72.27 72.33 595 ta 42.61 51.37 46.58 292 det 87.73 90.48 89.09 1114 arg_mod 79.18 75.47 77.28 8295 mod 74.43 67.78 70.95 3908 ncmod 75.72 69.94 72.72 3550 xmod 53.21 46.63 49.70 178 cmod 45.95 30.36 36.56 168 pmod 30.77 33.33 32.00 12 arg 77.42 76.45 76.94 4387 subj_or_dobj 82.36 74.51 78.24 3127 subj 78.55 66.91 72.27 1363 ncsubj 79.16 67.06 72.61 1354 xsubj 33.33 28.57 30.77 7 csubj 12.50 50.00 20.00 2 comp 75.89 79.53 77.67 3024 obj 79.49 79.42 79.46 2328 dobj 83.63 79.08 81.29 1764 obj2 23.08 30.00 26.09 20 iobj 70.77 76.10 73.34 544 clausal 60.98 74.40 67.02 672 xcomp 76.88 77.69 77.28 381 ccomp 46.44 69.42 55.55 291 pcomp 72.73 66.67 69.57 26 macroaverage 62.12 63.77 62.94 microaverage 77.66 74.98 76.29 Parsing accuracy on DepBank [Briscoe et al., 2006] Micro-averaged precision, recall and F1 score are calculated from the counts for all relations in the hierarchy Macro-averaged scores are the mean of the individual scores for each relation Micro-averaged F1 score of 76.3% across all relations IA161 Syntactic Formalisms for Parsing Natural Languages 410 / 476 Lecture 10 Stanford parser Java implementation of probabilistic natural language parsers (version 1.6.9) : [Klein and Manning, 2003] Parsing system for English and has been used in Chinese, German, Arabic, Italian, Bulgarian, Portuguese Implementation, both highly optimized PCFG and lexicalized dependency parser, and lexicalized PCFG parser Useful links http://nlp.stanford.edu/software/lex-parser.shtml http://nlp.stanford.edu:8080/parser/ IA161 Syntactic Formalisms for Parsing Natural Languages 411 / 476 Lecture 10 Stanford parser Input various form of plain text Output Various analysis formats → Stanford Dependencies (SD): typed dependencies as GRs → phrase structure trees → POS tagged text makes distributes Bell based Angeles Los products electronic computer building conj_and dobj dobj nsubj nsubj partmod prep_in nn amod amodamod conj_and conj_and Graphical representation of the SD for the sentence “Bell, based in Los Angeles, makes and distributes electronic, computer and building products.” IA161 Syntactic Formalisms for Parsing Natural Languages 412 / 476 Lecture 10 Standford typed dependencies [De Marmette and Manning, 2008] provide a simple description of the grammatical relationships in a sentence represents all sentence relationships uniformly as typed dependency relations quite accessible to non-linguists thinking about tasks involving information extraction from text and is quite effective in relation extraction applications. IA161 Syntactic Formalisms for Parsing Natural Languages 413 / 476 Lecture 10 Standford typed dependencies [De Marnette and Manning, 2008] For an example sentence: Bell, based in Los Angeles, makes and distributes electronic, computer and building products. Stanford Dependencies (SD) representation is: nsubj(makes-8, Bell-1) nsubj(distributes-10, Bell-1) partmod(Bell-1, based-3) nn(Angeles-6, Los-5) prep_in(based-3, Angeles-6) root(ROOT-0, makes-8) conj_and(makes-8, distributes-10) amod(products-16, electronic-11) conj_and(electronic-11, computer-13) amod(products-16, computer-13) conj_and(electronic-11, building-15) amod(products-16, building-15) dobj(makes-8, products-16) dobj(distributes-10, products-16) IA161 Syntactic Formalisms for Parsing Natural Languages 414 / 476 Lecture 10 Output A lineup of masseurs was waiting to take the media in hand. . POS tagged text .. ...... Parsing [sent. 4 len. 13]: [A, lineup, of, masseurs, was, waiting, to, take, the, media, in, hand, .] . CFPSG representation .. ...... (ROOT (S (NP (NP (DT A) (NN lineup)) (PP (IN of) (NP (NNS masseurs)))) (VP (VBD was) (VP (VBG waiting) (S (VP (TO to) (VP (VB take) (NP (DT the) (NNS media)) (PP (IN in) (NP (NN hand)))))))) (. .))) . Typed dependencies representation .. ...... det(lineup2, A1) nsubj(waiting6, lineup2) xsubj(take8, lineup2) prep_of(lineup2, masseurs4) aux(waiting6, was5) root(ROOT0, waiting6) aux(take8, to7) xcomp(waiting6, take8) det(media10, the9) dobj(take8, media10) prep_in(take8, hand12) IA161 Syntactic Formalisms for Parsing Natural Languages 415 / 476 Lecture 10 Berkeley parser Learning PCFGs, statistical parser (release 1.1, version 09.2009) : [Petrov et al., 2006; Petrov and Klein, 2007] Parsing system for English and has been used in Chinese, German, Arabic, Bulgarian, Portuguese, French Implementation of unlexicalized PCFG parser Useful links http://nlp.cs.berkeley.edu/ http://tomato.banatao.berkeley.edu: 8080/parser/parser.html http://code.google.com/p/berkeleyparser/ IA161 Syntactic Formalisms for Parsing Natural Languages 416 / 476 Lecture 10 Comparison of parsing an example sentence A lineup of masseurs was waiting to take the media in hand. IA161 Syntactic Formalisms for Parsing Natural Languages 417 / 476 Lecture 10 charniak parser Probabilistic LFG F-Structure Parsing : [Charniak, 2000; Bikel, 2002] Parsing system for English PCFG based wide coverage LFG parser Useful links http://nclt.computing.dcu.ie/demos.html http://lfg-demo.computing.dcu.ie/lfgparser.html IA161 Syntactic Formalisms for Parsing Natural Languages 418 / 476 Lecture 10 Collins parser Head-Driven Statistical Models for natural language parsing (Release 1.0, version 12.2002) : [Collins, 1999] Parsing system for English Useful links http://www.cs.columbia.edu/~mcollins/code.html IA161 Syntactic Formalisms for Parsing Natural Languages 419 / 476 Lecture 10 Bikel’s parser Multilingual statistical parsing engine (release 1.0, version 06.2008) : [Charniak, 2000; Bikel, 2002] Parsing system for English, Chinese, Arabic, Korean http://www.cis.upenn.edu/~dbikel/#stat-parser http://www.cis.upenn.edu/~dbikel/software.html IA161 Syntactic Formalisms for Parsing Natural Languages 420 / 476 Lecture 10 Comparing parser speed on section 23 of WSJ Penn Treebank Parser Time (min.) Collins 45 Charniak 28 Sagae 11 CCG 1.9 IA161 Syntactic Formalisms for Parsing Natural Languages 421 / 476 Lecture 11 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 422 / 476 Lecture 11 Study materials Course materials and homeworks are available on the following web site: https://is.muni.cz/course/fi/autumn2011/IA161 Refer to Dependency Parsing, Synthesis: Lectures on Human Language Technologies, S. kübler, R. McDonald and J. Nivre, 2009 IA161 Syntactic Formalisms for Parsing Natural Languages 423 / 476 Lecture 11 Outline Introduction to Dependency parsing methods Dependency Parsers IA161 Syntactic Formalisms for Parsing Natural Languages 424 / 476 Lecture 11 Introduction to Dependency parsing Motivation a. dependency-based syntactic representation seem to be useful in many applications of language technology: machine translation, information extraction → transparent encoding of predicate-argument structure b. dependency grammar is better suited than phrase structure grammar for language with free or flexible word order → analysis of diverse languages within a common framework c. leading to the development of accurate syntactic parsers for a number of languages → combination with machine learning from syntactically annotated corpora (e.g. treebank) IA161 Syntactic Formalisms for Parsing Natural Languages 425 / 476 Lecture 11 Introduction to Dependency parsing Dependency parsing “Task of automatically analyzing the dependency structure of a given input sentence” Dependency parser “Task of producing a labeled dependency structure of the kind depicted in the follow figure, where the words of the sentence are connected by typed dependency relations” ROOT Economic news had little effect on financial markets . PRED PU PC ATTATT OBJ ATTSBJATT IA161 Syntactic Formalisms for Parsing Natural Languages 426 / 476 Lecture 11 Definitions of dependency graphs and dependency parsing Dependency graphs: syntactic structures over sentences Def. 1.: A sentence is a sequence of tokens denoted by S = w0w1 . . . wn Def. 2.: Let R = {r1, . . . , rm} be a finite set of possible dependency relation types that can hold between any two words in a sentence. A relation type r ∈ R is additionally called an arc label. IA161 Syntactic Formalisms for Parsing Natural Languages 427 / 476 Lecture 11 Definitions of dependency graphs and dependency parsing Dependency graphs: syntactic structures over sentences Def. 3.: A dependency graph G = (V, A) is a labeled directed graph, consists of nodes, V, and arcs, A, such that for sentence S = w0w1 . . . wn and label set R the following holds: 1 V ⊆ {w0w1 . . . wn} 2 A ⊆ V × R × V 3 if (wi, r, wj) ∈ A then (wi, r′ , wj) /∈ A for all r′ ̸= r IA161 Syntactic Formalisms for Parsing Natural Languages 428 / 476 Lecture 11 Approach to dependency parsing a. data-driven it makes essential use of machine learning from linguistic data in order to parse new sentences b. grammar-based it relies on a formal grammar, defining a formal language, so that it makes sense to ask whether a given input is in the language defined by the grammar or not. → Data-driven have attracted the most attention in recent years. IA161 Syntactic Formalisms for Parsing Natural Languages 429 / 476 Lecture 11 Data-driven approach . ...... according to the type of parsing model adopted, the algorithms used to learn the model from data the algorithms used to parse new sentences with the model a. transition-based start by defining a transition system, or state machine, for mapping a sentence to its dependency graph. b. graph-based start by defining a space of candidate dependency graphs for a sentence. IA161 Syntactic Formalisms for Parsing Natural Languages 430 / 476 Lecture 11 Data-driven approach a. transition-based learning problem: induce a model for predicting the next state transition, given the transition history parsing problem: construct the optimal transition sequence for the input sentence, given induced model b. graph-based learning problem: induce a model for assigning scores to the candidate dependency graphs for a sentence parsing problem: find the highest-scoring dependency graph for the input sentence, given induced model IA161 Syntactic Formalisms for Parsing Natural Languages 431 / 476 Lecture 11 Transition-based Parsing Transition system consists of a set C of parser configurations and of a set D of transitions between configurations. Main idea: a sequence of valid transitions, starting in the initial configuration for a given sentence and ending in one of several terminal configurations, defines a valid dependency tree for the input sentence. D1′m = d1(c1), . . . , dm(cm) IA161 Syntactic Formalisms for Parsing Natural Languages 432 / 476 Lecture 11 Transition-based Parsing Definition Score of D1′m factors by configuration-transition pairs (ci, di): s(D1′m) = ∑m i=1 s(ci, di) Learning Scoring function s(ci, di) for di(ci) ∈ D1′m Inference Search for highest scoring sequence D∗ 1′m given s(ci, di) IA161 Syntactic Formalisms for Parsing Natural Languages 433 / 476 Lecture 11 Transition-based Parsing Inference for transition-based parsing Common inference strategies: Deterministic [Yamada and Matsumoto 2003, Nivre et al. 2004] Beam search [Johansson and Nugues 2006, Titov and Henderson 2007] Complexity given by upper bound on transition sequence length Transition system Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003] Limited non-projective O(n) [Attardi 2006, Nivre 2007] Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009] IA161 Syntactic Formalisms for Parsing Natural Languages 434 / 476 Lecture 11 Transition-based Parsing Learning for transition-based parsing Typical scoring function: s(ci, di) = w · f(ci, di) where f(ci, di) is a feature vector over configuration ci and transition di and w is a weight vector [wi = weight of featurefi(ci, di)] Transition system Projective O(n) [Yamada and Matsumoto 2003, Nivre 2003] Limited non-projective O(n) [Attardi 2006, Nivre 2007] Unrestricted non-projective O(n2) [Nivre 2008, Nivre 2009] Problem Learning is local but features are based on the global history IA161 Syntactic Formalisms for Parsing Natural Languages 435 / 476 Lecture 11 Graph-based Parsing For a input sentence S we define a graph Gs = (Vs, As) where Vs = {w0, w1, . . . , wn} and As = {(wi, wj, l)|wi, wj ∈ V and l ∈ L} Score of a dependency tree T factors by subgraphs Gs, . . . , Gs: s(T) = ∑m i−1 s(Gi) Learning: Scoring function s(Gi) for a subgraph Gi ∈ T Inference: Search for maximum spanning tree scoring sequence T∗ of Gs given s(Gi) IA161 Syntactic Formalisms for Parsing Natural Languages 436 / 476 Lecture 11 Graph-based Parsing Learning graph-based models Typical scoring function: s(Gi) = w · f(Gi) where f(Gi) is a high-dimensional feature vector over subgraphs and w is a weight vector [wj = weight of feature fj(Gi)] Structured learning [McDonald et al. 2005a, Smith and Johnson 2007]: Learn weights that maximize the score of the correct dependency tree for every sentence in the training set Problem Learning is global (trees) but features are local (subgraphs) IA161 Syntactic Formalisms for Parsing Natural Languages 437 / 476 Lecture 11 Grammar-based approach a. context-free dependency parsing exploits a mapping from dependency structures to CFG structure representations and reuses parsing algorithms originally developed for CFG → chart parsing algorithms b. constraint-based dependency parsing parsing viewed as a constraint satisfaction problem grammar defined as a set of constraints on well-formed dependency graphs finding a dependency graph for a sentence that satisfies all the constraints of the grammar (having the best score) IA161 Syntactic Formalisms for Parsing Natural Languages 438 / 476 Lecture 11 Grammar-based approach a. context-free dependency parsing Advantage: Well-studied parsing algorithms such as CKY, Earley’s algorithm can be used for dependency parsing as well. → need to convert dependency grammars into efficiently parsable context-free grammars; (e.g. bilexical CFG, Eisner and Smith, 2005) b. constraint-based dependency parsing defines the problem as constraint satisfaction Weighted constraint dependency grammar (WCDG, Foth and Menzel, 2005) Transformation-based CDG IA161 Syntactic Formalisms for Parsing Natural Languages 439 / 476 Lecture 11 Dependency parsers Trainable parsers Probabilistic dependency parser (Eisner, 1996, 2000) MSTParser (McDonald, 2006)-graph-based MaltParser (Nivre, 2007, 2008)-transition-based K-best Maximum Spanning Tree Dependency Parser (Hall, 2007) Vine Parser ISBN Dependency Parser Parsers for specific languages defines the problem as constraint satisfaction Minipar (Lin, 1998) WCDG Parser (Foth et al., 2005) Pro3Gres (Schneider, 2004) Link Grammar Parser (Lafferty et al., 1992) CaboCha (Kudo and Matsumoto, 2002) IA161 Syntactic Formalisms for Parsing Natural Languages 440 / 476 Lecture 11 MaltParser Data-driven dependency parsing system (Last version, 1.6.1, J. Hall, J. Nilsson and J. Nivre) Transition-based parsing system Implementation of inductive dependency parsing Useful for inducing a parsing model from treebank data Useful for parsing new data using an induced model Useful links http://maltparser.org IA161 Syntactic Formalisms for Parsing Natural Languages 441 / 476 Lecture 11 Components of system Deterministic parsing algorithms History-based models Discriminative learning Building labeled dependency graphs Predicting the next parser action at nondeterministic choice points Mapping histories to parser actions IA161 Syntactic Formalisms for Parsing Natural Languages 442 / 476 Lecture 11 MSTParser Running system Input: part-of-speech tags or word forms 1 Den _ PO PO DP 2 SS _ _ 2 blir _ V BV PS 0 ROOT _ _ 3 gemensam _ AJ AJ _ 2 SP _ _ 4 für _ PR PR _ 2 OA _ _ 5 alla _ PO PO TP 6 DT _ _ 6 inkomsttagare _ N NN HS 4 PA _ _ 7 oavsett _ PR PR _ 2 AA _ _ 8 civilständ _ N NN SS 7 PA _ _ 9 . _ P IP _ 2 IP _ _ Output: column containing a dependency label IA161 Syntactic Formalisms for Parsing Natural Languages 443 / 476 Lecture 11 MSTParser Minimum Spanning Tree Parser (Last version, 0.2, R. McDonald et al., 2005, 2006) Graph-based parsing system Useful links http://www.seas.upenn.edu/ strctlrn/MSTParser/MSTParser.html IA161 Syntactic Formalisms for Parsing Natural Languages 444 / 476 Lecture 11 MSTParser Running system Input data format: w1 w2 . . . wn p1 p2 . . . pn l1 l2 . . . ln d1 d2 . . . d2 Where, w1 ... wn are the n words of the sentence (tab deliminated) p1 ... pn are the POS tags for each word l1 ... ln are the labels of the incoming edge to each word d1 ... dn are integers representing the postition of each words parent Example: . ...... For example, the sentence ”John hit the ball” would be: John hit the ball N V D N SBJ ROOT MOD OBJ 2 0 4 2 IA161 Syntactic Formalisms for Parsing Natural Languages 445 / 476 Lecture 11 MSTParser Running system Output: column containing a dependency label IA161 Syntactic Formalisms for Parsing Natural Languages 446 / 476 Lecture 11 Comparing parsing accuracy Graph-based Vs. Transition-based MST Vs. Malt Language MST Malt Arabic 66.91 66.71 Bulgarian 87.57 87.41 Chinese 85.90 86.92 Czech 80.18 78.42 Danish 84.79 84.77 Dutch 79.19 78.59 German 87.34 85.82 Japanese 90.71 91.65 Portuguese 86.82 87.60 Slovene 73.44 70.30 Spanish 82.25 81.29 Swedish 82.55 84.58 Turkish 63.19 65.68 Average 80.83 80.75 Presented in Current Trends in Data-Driven Dependency Parsing by Joakim Nivre, 2009 IA161 Syntactic Formalisms for Parsing Natural Languages 447 / 476 Lecture 11 Link Parser Syntactic parser of English, based on the Link Grammar (version, 4.7.4, Feb. 2011, D. Temperley, D, Sleator, J. Lafferty, 2004) Words as blocks with connectors + or Words rules for defining the connection between the connectors Deep syntactic parsing system Useful links http://www.link.cs.cmu.edu/link/index.html http://www.abisource.com/ IA161 Syntactic Formalisms for Parsing Natural Languages 448 / 476 Lecture 11 Link Parser Example of a parsing in the Link Grammar: let’s test our proper sentences! http://www.link.cs.cmu.edu/link/submit-sentence-4.html IA161 Syntactic Formalisms for Parsing Natural Languages 449 / 476 Lecture 11 Link Parser John gives a book to Mary. IA161 Syntactic Formalisms for Parsing Natural Languages 450 / 476 Lecture 11 Link Parser Some fans on Friday will be seeking to add another store-opening shirt to collections they’ve assembled as if they were rare baseball cards. IA161 Syntactic Formalisms for Parsing Natural Languages 451 / 476 Lecture 11 WCDG parser Weighted Constraint Dependency Grammar Parser (version, 0.97-1, May, 2011, W. Menzel, N. Beuck, C. Baumgärtner ) incremental parsing syntactic predictions for incomplete sentences Deep syntactic parsing system Useful links http://nats-www.informatik.uni- hamburg.de/view/CDG/ParserDemo IA161 Syntactic Formalisms for Parsing Natural Languages 452 / 476 Lecture 12 . ...... Syntactic Formalisms for Parsing Natural Languages Aleš Horák, Miloš Jakubíček, Vojtěch Kovář (based on slides by Juyeon Kang) ia161@nlp.fi.muni.cz Autumn 2013 IA161 Syntactic Formalisms for Parsing Natural Languages 453 / 476 Lecture 12 . ...... Parsing Evaluation IA161 Syntactic Formalisms for Parsing Natural Languages 454 / 476 Lecture 12 Parsing Results usually some complex (i.e. non-scalar) structure, mostly a tree or a graph-like structure crucial question: how to measure the “goodness” of the result? IA161 Syntactic Formalisms for Parsing Natural Languages 455 / 476 Lecture 12 Extrinsic vs. Intrinsic Evaluation Intrinsic by comparing to a “gold”, i.e. correct, representation Extrinsic by exploiting the result in a 3rd party task and evaluating its results Which is better? IA161 Syntactic Formalisms for Parsing Natural Languages 456 / 476 Lecture 12 Intrinsic Evaluation – Phrase-Structure Syntax i.e. compare two phrase-structure trees and tell a number PARSEVAL metric LAA (Leaf-ancestor assessment) metric IA161 Syntactic Formalisms for Parsing Natural Languages 457 / 476 Lecture 12 PARSEVAL metric basic idea: penalize crossing brackets in the tree i.e. compare all constituents in the test tree to the gold tree ⇒ parsing viewed as classification problem IA161 Syntactic Formalisms for Parsing Natural Languages 458 / 476 Lecture 12 Precision, recall for classification problems in NLP, the standard evaluation is by means of precision and recall precision = |test ∩ gold| |test| recall = |test ∩ gold| |gold| two numbers, we just want to have one – F-score F1 score = 2·precision·recall precision+recall IA161 Syntactic Formalisms for Parsing Natural Languages 459 / 476 Lecture 12 F-score also F-measure general form: Fβ score Fβ score = (1 + β2) · precision·recall (β2+precision)+recall special case of β = 1 corresponds to the harmonic mean of precision and recall β can be used for favouring precision over recall (for β < 1) or vice versa (for β > 1) IA161 Syntactic Formalisms for Parsing Natural Languages 460 / 476 Lecture 12 PARSEVAL metric basic idea: penalize crossing brackets in the tree i.e. compare all constituents in the test tree to the gold tree ⇒ parsing viewed as classification problem ⇒ F-score on correct bracketings/constituents might even disregard non-terminal names sort of standardized tool available: the evalb script at http://nlp.cs.nyu.edu/evalb/ IA161 Syntactic Formalisms for Parsing Natural Languages 461 / 476 Lecture 12 PARSEVAL metric – example test vs. gold test:[S [NP John][VP [V likes][NP ice cream] [PP with chocolate]]] gold:[S [NP John][VP [V likes][NP [NP ice cream] [PP with chocolate]]]] precision = 6/6 = 1.0, recall = 6/7 = 0.86, F-score = 0.92 IA161 Syntactic Formalisms for Parsing Natural Languages 462 / 476 Lecture 12 PARSEVAL metric test vs. gold test:[S [NP John][VP [V likes][NP ice cream] [PP with chocolate]]] gold:[S [NP John][VP [V likes][NP [NP ice cream] [PP with chocolate]]]] precision = 6/6 = 1.0, recall = 6/7 = 0.86, F-score = 0.92 IA161 Syntactic Formalisms for Parsing Natural Languages 463 / 476 Lecture 12 PARSEVAL metric often subject to criticism (see e.g. Sampson, 2000) Sampson proposed another metric, the leaf-ancestor assessment (LAA) IA161 Syntactic Formalisms for Parsing Natural Languages 464 / 476 Lecture 12 LAA metric basic idea: for each leaf (word), compare the path to the root of the tree, compute the edit distance between both paths, finally take the average of all words in the previous example, the paths (lineages) are: (John) NP S vs. (John) NP S (likes) V VP S vs. (likes) V VP S (ice cream) NP VP S vs. (ice cream) NP NP VP S (with chocolate) PP VP S vs. (with chocolate) PP NP VP S IA161 Syntactic Formalisms for Parsing Natural Languages 465 / 476 Lecture 12 Intrinsic Evaluation – Dependency Syntax much easier just precision, labeled or unlabeled (as the number of correct dependencies) IA161 Syntactic Formalisms for Parsing Natural Languages 466 / 476 Lecture 12 Intrinsic Evaluation – Building Treebanks treebank = a syntactically annotated text corpus manual annotation according to some guidelines from the evaluation point of view: inter-annotator agreement (IAA) is a crucial property IA161 Syntactic Formalisms for Parsing Natural Languages 467 / 476 Lecture 12 Measuring IAA naïve approach: count how many times people agreed on problem: it does not account for agreement by chance IA161 Syntactic Formalisms for Parsing Natural Languages 468 / 476 Lecture 12 Chance-corrected coefficients for IAA S (Benett, Alpert and Goldstein, 1954) π (Scott, 1955) κ (Cohen, 1960) (there is lot of terminology confusion, we follow Ron Artstein, Massimo Poesio: Inter-coder Agreement for Computational Linguistics, 2008) Ao – observed agreement Ae – expected (chance) agreement for all coefficients, they compute: S, π, κ = Ao − Ae 1 − Ae IA161 Syntactic Formalisms for Parsing Natural Languages 469 / 476 Lecture 12 Chance-corrected coefficients for IAA S (Benett, Alpert and Goldstein, 1954) assumes that all categories and all annotators have uniform probability distribution π (Scott, 1955) assumes that different categories have different distributions shared across annotators κ (Cohen, 1960) assumes that different categories and different annotators have different distributions devised for 2 annotators, various modifications for more than 2 annotators available IA161 Syntactic Formalisms for Parsing Natural Languages 470 / 476 Lecture 12 Intrinsic Evaluation – Conclusions generally not easy builds on the assumption of having THE correct parse there is evidence that it does not correlate with extrinsic evaluation, i.e. how good the tool is for some particular job IA161 Syntactic Formalisms for Parsing Natural Languages 471 / 476 Lecture 12 Extrinsic Evaluation = evaluation on a particular task/application advantages: measures direct fitness for that task disadvantages: may not generalize for other tasks leads to crucial question: what can be parsing used for? IA161 Syntactic Formalisms for Parsing Natural Languages 472 / 476 Lecture 12 What can parsing be used for? in theory, (full) parsing is suitable/appropriate/necessary for many NLP tasks practically it turns out to be: often not accurate enough often too complicated to exploit sometimes just an overkill compared to shallow parsing or yet simpler approaches IA161 Syntactic Formalisms for Parsing Natural Languages 473 / 476 Lecture 12 What can parsing be used for? in theory, (full) parsing is suitable/appropriate/necessary for many NLP tasks information extraction information retrieval machine translation corpus linguistics computer lexicography question answering … IA161 Syntactic Formalisms for Parsing Natural Languages 474 / 476 Lecture 12 Where is parsing actually used now? prototype systems academia work production systems ??? IA161 Syntactic Formalisms for Parsing Natural Languages 475 / 476 Lecture 12 What to evaluate parsing on Sample (more or less well defined) applications (partial) morphological disambiguation text correcting systems word sketches phrase extraction simple treebank of high IAA IA161 Syntactic Formalisms for Parsing Natural Languages 476 / 476