Effective Parsing Using Competing CFG Rules

JAKUBÍČEK, Miloš. Effective Parsing Using Competing CFG Rules. In Habernal, Matoušek. Proceedings of Text, Speech and Dialogue 2011. Berlin, Heidelberg: Springer Verlag, 2011, p. 115-122. ISBN 978-3-642-23537-5.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Effective Parsing Using Competing CFG Rules
Authors	JAKUBÍČEK, Miloš (203 Czech Republic, guarantor, belonging to the institution).
Edition	Berlin, Heidelberg, Proceedings of Text, Speech and Dialogue 2011, p. 115-122, 8 pp. 2011.
Publisher	Springer Verlag

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
WWW	URL
RIV identification code	RIV/00216224:14330/11:00049948
Organization unit	Faculty of Informatics
ISBN	978-3-642-23537-5
UT WoS	000312640500015
Keywords in English	parsing; syntactic analysis; CFG; competing rule
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Miloš Jakubíček, Ph.D., učo 172962. Changed: 27/6/2012 12:01.

Abstract

In this paper a new pruning method for a rule-based parser is described that relies on separating the underlying grammar rules into several mutually competing levels. This method has been developed and exploited for Czech in the syntactic parser Synt to reduce the number of possible output derivation trees. The algorithm behind operates on a so called packed forest of trees, a compressing data structure used for internal representation of parallel analyses, and thus performs very effectively. An evaluation of its contribution has been performed on the Brno Phrasal Treebank showing that the algorithm significantly prunes the resulting tree space while preserving perspective parses.

Abstract (in Czech)

Článek představuje novou prořezávací metodu pro pravidlový syntaktický analyzátor, která je založena na rozdělení gramatických pravidel do několika vzájemně se vylučujících úrovní. Tato metoda byla vyvinuta a využita pro český syntaktický analyzátor Synt za účelem snížení počtu výstupních syntaktických stromů. Související algoritmy jsou velmi efektivní díky tomu, že využívají kompresivní datové struktury, která zahrnuje všechny paralelní analýzy. Vyhodnocení přínosu vyvinuté metody bylo provedeno na stromovém korpusu Brno Phrasal Treebank a prokazuje výrazné snížení počtu výstupních stromů, aniž by tím zároveň byla dotčena přesnost analýzy.

Links
GAP401/10/0792, research and development project	Name: Temporální aspekty znalostí a informací
GAP401/10/0792, research and development project	Investor: Czech Science Foundation
VF20102014003, research and development project	Name: Analýza přirozeného jazyka v prostředí internetu (Acronym: APJI)
VF20102014003, research and development project	Investor: Ministry of the Interior of the CR
248307, interní kód MU	Name: Pattern Recognition-based Statistically Enhanced MT (Acronym: PRESEMT)
248307, interní kód MU	Investor: European Union, Pattern Recognition-based Statistically Enhanced MT, Cooperation

PrintDisplayed: 5/10/2024 14:51

Effective Parsing Using Competing CFG Rules

Other applications