Detailed Information on Publication Record
2014
Heterogeneous Queries for Synoptic and Phrasal Search
SUCHOMEL, Šimon and Michal BRANDEJSBasic information
Original name
Heterogeneous Queries for Synoptic and Phrasal Search
Authors
SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, guarantor, belonging to the institution)
Edition
Sheffield, UK, CLEF2014 Working Notes, p. 1017-1020, 4 pp. 2014
Publisher
CEUR, Aachen University
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Germany
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/14:00077319
Organization unit
Faculty of Informatics
ISSN
Keywords in English
suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; snippet similarity;
Tags
International impact, Reviewed
Změněno: 28/4/2015 10:44, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
This paper describes our approaches for the Plagiarism Detection – Source Retrieval task of PAN 2014. We combined and improved methodology used at PAN 2012 and PAN 2013. Our system combines three types of queries: The keywords-based queries; the paragraph-based queries; and the headers-based queries. The queries are distinguished also by other properties such as the phrase query or the positional query. The queries are submitted to two search engines – Chatnoir and Indri – according to their properties. The query’s position serves for the search control, minimization of the total number of executed queries is the system’s priority. Downloaded documents are textually compared with the suspicious document and if a similarity is found, the downloaded document is reported.
Links
LG13010, research and development project |
|