SUCHOMEL, Šimon, Jan KASPRZAK and Michal BRANDEJS. Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013. Online. In 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179. Valencia; Spain: CEUR, 2013, p. nestránkováno, 8 pp. ISSN 1613-0073.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013
Authors SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution), Jan KASPRZAK (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, belonging to the institution).
Edition Valencia; Spain, 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179, p. nestránkováno, 8 pp. 2013.
Publisher CEUR
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Spain
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
RIV identification code RIV/00216224:14330/13:00087410
Organization unit Faculty of Informatics
ISSN 1613-0073
Keywords in English suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 27/8/2019 11:55.
Abstract
This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.
Links
LG13010, research and development projectName: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 26/5/2024 12:36