D 2017

Text Punctuation: An Inter-annotator Agreement Study

BOHÁČ, Marek, Michal ROTT and Vojtěch KOVÁŘ

Basic information

Original name

Text Punctuation: An Inter-annotator Agreement Study

Authors

BOHÁČ, Marek (203 Czech Republic), Michal ROTT (203 Czech Republic) and Vojtěch KOVÁŘ (203 Czech Republic, guarantor, belonging to the institution)

Edition

Cham, Text, Speech, and Dialogue: 20th International Conference, TSD 2017, p. 120-128, 9 pp. 2017

Publisher

Springer International Publishing

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/17:00095096

Organization unit

Faculty of Informatics

ISBN

978-3-319-64205-5

ISSN

UT WoS

000449869200014

Keywords (in Czech)

doplňování čárek;mluvený jazyk;mezianotátorská shoda

Keywords in English

Comma adding;Spoken language;Inter-annotator agreement

Tags

Tags

International impact, Reviewed
Změněno: 27/4/2020 23:37, Mgr. Michal Petr

Abstract

V originále

Spoken language is a phenomenon which is hard to be annotated accurately. One of the most ambiguous tasks is to fill in the punctuation marks into the spoken language transcription. Used punctuation marks are often dependent on how annotators understand the transcription content. This may differ as the spoken language often lacks clear structure (inherent to written language) due to the utterance spontaneity or due to skipping between ideas. Therefore we suspect that filling commas into the spoken language transcription is a very ambiguous task with low inter-annotator agreement (IAA). In this paper we analyze the IAA within group of annotators and we propose methods to increase it. We also propose and evaluate a reformulation of classical GT annotations for cases with multiple annotations available.

In Czech

Článek se zabývá problematikou doplňování čárek do mluveného textu, zejména mezianotátorskou shodou a přesností současných počítačových programů.

Links

GA15-13277S, research and development project
Name: Hyperintensionální logika pro analýzu přirozeného jazyka
Investor: Czech Science Foundation
LM2015071, research and development project
Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0897/2016, interní kód MU
Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VI.
Investor: Masaryk University, Category A