Citation Data of Czech Apex Courts (preprint)

j 2020

Citation Data of Czech Apex Courts (preprint)

HARAŠTA, Jakub, Tereza NOVOTNÁ and Jaromír ŠAVELKA

Basic information

Original name

Citation Data of Czech Apex Courts (preprint)

Authors

HARAŠTA, Jakub, Tereza NOVOTNÁ and Jaromír ŠAVELKA

Edition

arXiv, arXiv:2002.02224, 2020

Other information

Language

English

Type of outcome

Článek v odborném periodiku (nerecenzovaný)

Field of Study

50500 5.5 Law

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

Text (arXiv.org) Dataset (GitHub)

Organization unit

Faculty of Law

Keywords in English

reference recognition; reference extraction; document segmentation; NLP pipeline; citation data; Supreme Court; Supreme Administrative Court; Constitutional Court; Czech Republic

Abstract

V originále

In this paper, we introduce the citation data of the Czech apex courts (Supreme Court, Supreme Administrative Court and Constitutional Court). This dataset was automatically extracted from the corpus of texts of Czech court decisions - CzCDC 1.0. We obtained the citation data by building the natural language processing pipeline for extraction of the court decision identifiers. The pipeline included the (i) document segmentation model and the (ii) reference recognition model. Furthermore, the dataset was manually processed to achieve high-quality citation data as a base for subsequent qualitative and quantitative analyses. The dataset is available to the general public at GitHub.

Links

GA17-20645S, research and development project

Name: Exaktní hodnocení aplikační relevance judikatury

Investor: Czech Science Foundation

Detailed Information on Publication Record