BUŠTA, Jan. Computing Idioms Frequency in Text Corpora. In Proceedings of Recent Advances in Slavonic Natural Language Processing 2008. Brno: Masaryk University, 2008, p. 0-0, 4 pp. ISBN 978-80-210-4741-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Computing Idioms Frequency in Text Corpora
Name in Czech Výpočet četnosti idiomů v korpusu
Authors BUŠTA, Jan (203 Czech Republic, guarantor, belonging to the institution).
Edition Brno, Proceedings of Recent Advances in Slavonic Natural Language Processing 2008, p. 0-0, 4 pp. 2008.
Publisher Masaryk University
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 60200 6.2 Languages and Literature
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
RIV identification code RIV/00216224:14330/08:00034421
Organization unit Faculty of Informatics
ISBN 978-80-210-4741-9
UT WoS 000302212600012
Keywords in English frequency of idioms; headwords; text corpora; czech language
Tags Czech language, frequency of idioms, headwords, Text Corpora
Changed by Changed by: Mgr. Jan Bušta, učo 172959. Changed: 1/6/2021 07:47.
Abstract
The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language. The idioms are phrases which meaning is not composed from the meanings of each word in the phrase. This is one of the natural examples of violating the principle of compositionality that means that idioms are in area of natural language processing problem of meaning mining. To count the frequency of phrases such idioms in corpora has one big aim: To get to know which phrases we use often and which less. We do it to be able to start with getting the meaning of the whole phrases not just each word. This improves the understanding natural language.
Abstract (in Czech)
Idiomy jsou slovní spojení, jejichž význam se neskládá z významů jednotlivých slov. Idiomy jsou příkladem porušování principu kompozicionality a tím jsou problémem při strojovém zpracování jazyka. Výpočet četnosti idiomů v korpusu přinese informaci, které idiomy se používají častěji, které méně často. Seřazení idiomů dle jejich četnosti ukáže, na které idiomy je třeba se soustředit více, a tak lépe porozumět přirozenému jazyku.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
2C06009, research and development projectName: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Acronym: COT-SEWing)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 25/5/2024 02:36