D 2008

Computing Idioms Frequency in Text Corpora

BUŠTA, Jan

Basic information

Original name

Computing Idioms Frequency in Text Corpora

Name in Czech

Výpočet četnosti idiomů v korpusu

Authors

BUŠTA, Jan (203 Czech Republic, guarantor, belonging to the institution)

Edition

Brno, Proceedings of Recent Advances in Slavonic Natural Language Processing 2008, p. 0-0, 4 pp. 2008

Publisher

Masaryk University

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/08:00034421

Organization unit

Faculty of Informatics

ISBN

978-80-210-4741-9

UT WoS

000302212600012

Keywords in English

frequency of idioms; headwords; text corpora; czech language
Změněno: 1/6/2021 07:47, Mgr. Jan Bušta

Abstract

V originále

The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language. The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.

In Czech

Idiomy jsou slovní spojení, jejichž význam se neskládá z významů jednotlivých slov. Idiomy jsou příkladem porušování principu kompozicionality a tím jsou problémem při strojovém zpracování jazyka. Výpočet četnosti idiomů v korpusu přinese informaci, které idiomy se používají častěji, které méně často. Seřazení idiomů dle jejich četnosti ukáže, na které idiomy je třeba se soustředit více, a tak lépe porozumět přirozenému jazyku.

Links

LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
2C06009, research and development project
Name: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Acronym: COT-SEWing)
Investor: Ministry of Education, Youth and Sports of the CR