D 2004

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

SMRŽ, Pavel and Anna SINOPALNIKOVA

Basic information

Original name

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

Name in Czech

Korpusová analýza pro tvorbu lexikálních databází - Případová studie ruského a českého wordnetu

Authors

SMRŽ, Pavel (203 Czech Republic, guarantor) and Anna SINOPALNIKOVA (643 Russian Federation)

Edition

Saint-Petersburg, Russia, Proceedings of the 33th International Conference on Linguistics, p. 23-29, 7 pp. 2004

Publisher

Saint-Petersburg State University Press

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Russian Federation

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

RIV identification code

RIV/00216224:14330/04:00010449

Organization unit

Faculty of Informatics

Keywords in English

corpus; lexical database; lexico-syntactic patterns; word sketches
Změněno: 18/1/2005 11:22, doc. RNDr. Pavel Smrž, Ph.D.

Abstract

V originále

The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all discussed linguistic studies. We mainly focus on the methods of substitutions and extractions of lexico-syntactic patterns that present a kind of standard approaches to the creation of lexical databases. We also briefly mention the employment of word sketches a new technique in lexicography aiming at speed up of corpus analysis work

In Czech

Příspěvek se zabývá korpusovými metodami aplikovanými při výstavbě lexkální databáze.

Links

GA405/03/0913, research and development project
Name: Velké jazykové korpusy a jejich automatická analýza
Investor: Czech Science Foundation, Very Large Language Corpora and Their Automatic Analysis
MSM 143300003, plan (intention)
Name: Interakce člověka s počítačem, dialogové systémy a asistivní technologie
Investor: Ministry of Education, Youth and Sports of the CR, Human-computer interaction, dialog systems and assistive technologies