Detailed Information on Publication Record
2013
Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013
SUCHOMEL, Šimon, Jan KASPRZAK and Michal BRANDEJSBasic information
Original name
Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013
Authors
SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution), Jan KASPRZAK (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, belonging to the institution)
Edition
Valencia; Spain, 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179, p. nestránkováno, 8 pp. 2013
Publisher
CEUR
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Spain
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/13:00087410
Organization unit
Faculty of Informatics
ISSN
Keywords in English
suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing
Změněno: 27/8/2019 11:55, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.
Links
LG13010, research and development project |
|