D 2010

Automatic Identification of Legal Terms in Czech Law Texts

PALA, Karel; Pavel RYCHLÝ and Pavel ŠMERK

Basic information

Original name

Automatic Identification of Legal Terms in Czech Law Texts

Name in Czech

Automatická identifikace právních termínů v českých právních textech

Authors

PALA, Karel (203 Czech Republic, guarantor, belonging to the institution); Pavel RYCHLÝ (203 Czech Republic, belonging to the institution) and Pavel ŠMERK ORCID (203 Czech Republic, belonging to the institution)

Edition

Berlin, Semantic Processing of Legal Texts, p. 83-94, 12 pp. 2010

Publisher

Springer

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

Publication form

printed version "print"

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/10:00065871

Organization unit

Faculty of Informatics

ISBN

978-3-642-12836-3

ISSN

Keywords in English

terminology extraction; natural language processing; legal language

Tags

International impact, Reviewed
Changed: 30/4/2014 04:24, RNDr. Pavel Šmerk, Ph.D.

Abstract

In the original language

Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.

Links

GA407/07/0679, research and development project
Name: Právní e-slovník - PES
Investor: Czech Science Foundation, Legal e-dictionary - PES
LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky