Distributed System for Discovering Similar Documents: From a
Relational Database to the Custom-Developed Parallel Solution

D 2008

Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution

KASPRZAK, Jan, Michal BRANDEJS, Miroslav KŘIPAČ a Pavel ŠMERK

Základní údaje

Originální název

Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution

Název česky

Distribuovaný systém pro vyhledávání podobných dokumentů: od relační databáze k paralelnímu řešení na míru

Autoři

KASPRZAK, Jan (203 Česká republika, garant), Michal BRANDEJS (203 Česká republika), Miroslav KŘIPAČ (203 Česká republika) a Pavel ŠMERK (203 Česká republika)

Vydání

Setúbal, Portugal, ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration, od s. 437-440, 4 s. 2008

Nakladatel

INSTICC (Institute for Systems and Technologies of Information, Control and Communication)

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Kód RIV

RIV/00216224:14330/08:00036064

Organizační jednotka

Fakulta informatiky

ISBN

978-989-8111-36-4

UT WoS

000259488200068

Klíčová slova anglicky

University; Plagiarism; Similar Documents; Cluster; Information System; Theses

Štítky

cluster, information system, IS, Plagiarism, Similar Documents, theses, University

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 31. 3. 2010 11:33, Mgr. Ľuboš Lunter

Anotace

ORIG CZ

V originále

One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students' papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.

Česky

Článek představuje systém pro odhalování podobných dokumentů, který je na Masarykově univerzitě používá od srpna 2006 a který bude použit i pro Český národní archív kvalifikačních prací.

Návaznosti

LA 168, projekt VaV

Název: Účast ČR ve výzkumném sdružení ERCIM

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Účast ČR ve výzkumném sdružení ERCIM

Citovat

KASPRZAK, Jan, Michal BRANDEJS, Miroslav KŘIPAČ a Pavel ŠMERK. Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution. In ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration. Setúbal, Portugal: INSTICC (Institute for Systems and Technologies of Information, Control and Communication), 2008, s. 437-440. ISBN 978-989-8111-36-4.

@inproceedings{838694,
   author = {Kasprzak, Jan and Brandejs, Michal and Křipač, Miroslav and Šmerk, Pavel},
   address = {Setúbal, Portugal},
   booktitle = {ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration},
   keywords = {University; Plagiarism; Similar Documents; Cluster; Information System; Theses},
   language = {eng},
   location = {Setúbal, Portugal},
   isbn = {978-989-8111-36-4},
   pages = {437-440},
   publisher = {INSTICC (Institute for Systems and Technologies of Information, Control and Communication)},
   title = {Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution},
   year = {2008}
}

TY  - JOUR
ID  - 838694
AU  - Kasprzak, Jan - Brandejs, Michal - Křipač, Miroslav - Šmerk, Pavel
PY  - 2008
TI  - Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution
PB  - INSTICC (Institute for Systems and Technologies of Information, Control and Communication)
CY  - Setúbal, Portugal
SN  - 9789898111364
KW  - University
KW  - Plagiarism
KW  - Similar Documents
KW  - Cluster
KW  - Information System
KW  - Theses
N2  - One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students' papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.
ER  -

KASPRZAK, Jan, Michal BRANDEJS, Miroslav KŘIPAČ a Pavel ŠMERK. Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution. In \textit{ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration}. Setúbal, Portugal: INSTICC (Institute for Systems and Technologies of Information, Control and Communication), 2008, s.~437-440. ISBN~978-989-8111-36-4.

Podrobný výpis o publikaci