D 2014

Intelligent Search and Replace for Czech Phrases

NEVĚŘILOVÁ, Zuzana and Vít SUCHOMEL

Basic information

Original name

Intelligent Search and Replace for Czech Phrases

Authors

NEVĚŘILOVÁ, Zuzana (203 Czech Republic, guarantor, belonging to the institution) and Vít SUCHOMEL (203 Czech Republic, belonging to the institution)

Edition

Brno, Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, p. 97-105, 9 pp. 2014

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/14:00077518

Organization unit

Faculty of Informatics

ISSN

UT WoS

000374560500013

Keywords in English

search and replace; detecting phrases; generating phrases; subject-predicative complement

Tags

International impact, Reviewed
Změněno: 25/5/2021 19:20, RNDr. Vít Suchomel, Ph.D.

Abstract

V originále

This work proposes a new improvement of the ‘Search and Replace’ function well known from most text processing software. The standard search and replace function is used to replace exact form of words or phrases by another words or phrases in text documents. It is quite sufficient for languages with minimal inflection such as English. However, a well working word or phrase replacement function for morphologically rich languages requires much more thought. We explore the issues of implementing a useful search and replace in the Czech language and propose solutions to majority of the problems: A syntactic parser is employed to identify the phrases containing the search word or phrase. The correct word forms used as a replacement are generated by a morphological analyser. A web demonstration utilizing the proposed solution is presented. The attached examples of use reveal the cases in which the implemented method works well.

Links

LM2010013, research and development project
Name: LINDAT-CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
7F14047, research and development project
Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
Investor: Ministry of Education, Youth and Sports of the CR