Information System of Masaryk University 

Plagiarism Detection through Vector Space Models Applied to a Digital Library

česky | in English

ŘEHŮŘEK, Radim. Plagiarism Detection through Vector Space Models Applied to a Digital Library. In RASLAN 2008. 1,. Brno: Masarykova Univerzita, 2008. p. 75-83, 9 pp. ISBN 978-80-210-4741-9.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Plagiarism Detection through Vector Space Models Applied to a Digital Library
Name in Czech Detekce plagiátů v digitální knihovně
Authors ŘEHŮŘEK, Radim (203 Czech Republic, guarantor).
Edition 1,. Brno, RASLAN 2008, p. 75-83, 9 pp. 2008.
Publisher Masarykova Univerzita
Other information
Original language English
Type of outcome article in proceedings
Field of Study Use of computers, robotics and its application
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
WWW URL
RIV identification code RIV/00216224:14330/08:00024438
Organization unit Faculty of Informatics
ISBN 978-80-210-4741-9
UT WoS 000302212600013
Keywords in English plagiarism; vector space; digital library
Tags digital library, Plagiarism, vector space
Tags International impact
Changed by Changed by: RNDr. Radim Řehůřek, Ph.D., učo 39672. Changed: 28. 1. 2009 16:01.
Abstract
Plagiarism is an increasing problem in the digital world. The sheer amount of digital data calls for automation of plagirism discovery. In this paper we evaluate an Information Retrieval approach of dealing with plagiarism through Vector Spaces. This will allow us to detect similarities that are not result of naive copy\&paste. We also consider the extension of Vector Spaces where input documents are analyzed for term co-occurence, allowing us to introduce some semantics into our approach beyond mere word matching. The approach is evaluated on a real-world collection of mathematical documents as part of the DML-CZ project.
Abstract (in Czech)
Článek se věnuje využití vektorových prostorů pro detekci plagiátů. Jsou uvažovany metody rozšiřující základní vektorový model o práci se synonymy a statistickou sémantikou. Přístupy jsou vyhodnoceny na reálné kolekci matematických textů z projektu DML-CZ.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Basic Research Center
1ET200190513, research and development projectName: DML-CZ: Česká digitální matematická knihovna
Investor: Academy of Sciences of the Czech Republic, Information society (National programme of research)
2C06009, research and development projectName: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Acronym: COT-SEWing)
Investor: Ministry of Education, Youth and Sports of the CR, Information technologies for knowledge society
PrintDisplayed: 20. 9. 2017 04:10

Other references 


Go to top | Current date and time: 20. 9. 2017 04:10, Week 38 (even)

Contact: istech(zavináč/atsign)fi(tečka/dot)muni(tečka/dot)cz, Office for Studies, access rights administrators, is-technicians, e-technicians, IT support | Use of cookies | learn more about Information System