SUCHOMEL, Šimon and Michal BRANDEJS. Heterogeneous Queries for Synoptic and Phrasal Search. In CLEF2014 Working Notes. Sheffield, UK: CEUR, Aachen University, 2014. p. 1017-1020. ISSN 1613-0073.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Heterogeneous Queries for Synoptic and Phrasal Search
Authors SUCHOMEL, Šimon (203 Czech Republic, belonging to the institution) and Michal BRANDEJS (203 Czech Republic, guarantor, belonging to the institution).
Edition Sheffield, UK, CLEF2014 Working Notes, p. 1017-1020, 4 pp. 2014.
Publisher CEUR, Aachen University
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Germany
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
RIV identification code RIV/00216224:14330/14:00077319
Organization unit Faculty of Informatics
ISSN 1613-0073
Keywords in English suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; snippet similarity;
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 28. 4. 2015 10:44.
Abstract
This paper describes our approaches for the Plagiarism Detection – Source Retrieval task of PAN 2014. We combined and improved methodology used at PAN 2012 and PAN 2013. Our system combines three types of queries: The keywords-based queries; the paragraph-based queries; and the headers-based queries. The queries are distinguished also by other properties such as the phrase query or the positional query. The queries are submitted to two search engines – Chatnoir and Indri – according to their properties. The query’s position serves for the search control, minimization of the total number of executed queries is the system’s priority. Downloaded documents are textually compared with the suspicious document and if a similarity is found, the downloaded document is reported.
Links
LG13010, research and development projectName: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
Type Name Uploaded/Created by Uploaded/Created Rights
pan14.pdf Licence Creative Commons  File version Suchomel, Š. 14. 11. 2014

Properties

Address within IS
https://is.muni.cz/auth/publication/1206027/pan14.pdf
Address for the users outside IS
https://is.muni.cz/publication/1206027/pan14.pdf
Address within Manager
https://is.muni.cz/auth/publication/1206027/pan14.pdf?info
Address within Manager for the users outside IS
https://is.muni.cz/publication/1206027/pan14.pdf?info
Uploaded/Created
Fri 14. 11. 2014 15:09, RNDr. Šimon Suchomel, Ph.D.

Rights

Right to read
  • anyone on the Internet
Right to upload
 
Right to administer:
  • a concrete person doc. Ing. Michal Brandejs, CSc., učo 2116
  • a concrete person RNDr. Pavel Šmerk, Ph.D., učo 3880
  • a concrete person RNDr. Šimon Suchomel, Ph.D., učo 98949
Attributes
 

pan14.pdf

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1206027/pan14.pdf
Address for the users outside IS
http://is.muni.cz/publication/1206027/pan14.pdf
File type
PDF (application/pdf)
Size
147,5 KB
Hash md5
ff4246448883e868578abdcbd5d90183
Uploaded/Created
Fri 14. 11. 2014 15:09

pan14.txt

Application
Open the file
Download file.
Address within IS
https://is.muni.cz/auth/publication/1206027/pan14.txt
Address for the users outside IS
http://is.muni.cz/publication/1206027/pan14.txt
File type
plain text (text/plain)
Size
9,8 KB
Hash md5
93d4c1dc9b109841203361ed40f7345f
Uploaded/Created
Fri 14. 11. 2014 15:12
Print
Report a file uploaded without authorization. Displayed: 20. 5. 2022 23:35