KASPRZAK, Jan, Michal BRANDEJS, Miroslav KŘIPAČ and Pavel ŠMERK. Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution. In ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration. Setúbal, Portugal: INSTICC (Institute for Systems and Technologies of Information, Control and Communication), 2008, p. 437-440. ISBN 978-989-8111-36-4.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution
Name in Czech Distribuovaný systém pro vyhledávání podobných dokumentů: od relační databáze k paralelnímu řešení na míru
Authors KASPRZAK, Jan (203 Czech Republic, guarantor), Michal BRANDEJS (203 Czech Republic), Miroslav KŘIPAČ (203 Czech Republic) and Pavel ŠMERK (203 Czech Republic).
Edition Setúbal, Portugal, ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration, p. 437-440, 4 pp. 2008.
Publisher INSTICC (Institute for Systems and Technologies of Information, Control and Communication)
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
RIV identification code RIV/00216224:14330/08:00036064
Organization unit Faculty of Informatics
ISBN 978-989-8111-36-4
UT WoS 000259488200068
Keywords in English University; Plagiarism; Similar Documents; Cluster; Information System; Theses
Tags cluster, information system, IS, Plagiarism, Similar Documents, theses, University
Tags International impact, Reviewed
Changed by Changed by: Mgr. Ľuboš Lunter, učo 143320. Changed: 31/3/2010 11:33.
Abstract
One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students' papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.
Abstract (in Czech)
Článek představuje systém pro odhalování podobných dokumentů, který je na Masarykově univerzitě používá od srpna 2006 a který bude použit i pro Český národní archív kvalifikačních prací.
Links
LA 168, research and development projectName: Účast ČR ve výzkumném sdružení ERCIM
Investor: Ministry of Education, Youth and Sports of the CR, Účast ČR ve výzkumném sdružení ERCIM
PrintDisplayed: 23/7/2024 18:47