D 2008

Distributed System for Discovering Similar Documents

KASPRZAK, Jan, Michal BRANDEJS, Miroslav KŘIPAČ and Pavel ŠMERK

Basic information

Original name

Distributed System for Discovering Similar Documents

Name in Czech

Distribuovaný systém pro odhalování podobných dokumentů

Edition

Brno, 14 pp. 2008

Publisher

Faculty of Informatics, Masaryk University

Other information

Type of outcome

Stať ve sborníku

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

Organization unit

Faculty of Informatics

Keywords in English

University; Plagiarism; Similar Documents; Cluster; Information System; Theses

Tags

International impact
Změněno: 1/7/2010 16:41, RNDr. Jan Kasprzak, Ph.D.

Abstract

V originále

One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students` papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.

In Czech

Článek představuje systém pro odhalování podobných dokumentů, který je na Masatykově univerzitě používá od srpna 2006 a který bude použit i pro Český národní archív kvalifikačních prací.

Links

LA 168, research and development project
Name: Účast ČR ve výzkumném sdružení ERCIM
Investor: Ministry of Education, Youth and Sports of the CR, Účast ČR ve výzkumném sdružení ERCIM