D 2013

Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013

SUCHOMEL, Šimon, Jan KASPRZAK and Michal BRANDEJS

Basic information

Original name

Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013

Authors

SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution), Jan KASPRZAK (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, belonging to the institution)

Edition

Valencia; Spain, 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179, p. nestránkováno, 8 pp. 2013

Publisher

CEUR

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Spain

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

RIV identification code

RIV/00216224:14330/13:00087410

Organization unit

Faculty of Informatics

ISSN

Keywords in English

suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing
Změněno: 27/8/2019 11:55, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.

Links

LG13010, research and development project
Name: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR