Other formats:
BibTeX
LaTeX
RIS
@book{890028, author = {Kasprzak, Jan}, address = {Brno}, keywords = {similar documents; document overlap detection; plagiarism; natural language processing; distributed computing}, language = {eng}, location = {Brno}, title = {Systems for Discovering Similar Documents}, url = {http://is.muni.cz/th/1885/fi_r/}, year = {2010} }
TY - BOOK ID - 890028 AU - Kasprzak, Jan PY - 2010 TI - Systems for Discovering Similar Documents CY - Brno KW - similar documents KW - document overlap detection KW - plagiarism KW - natural language processing KW - distributed computing UR - http://is.muni.cz/th/1885/fi_r/ N2 - With the wider availability of the electronic texts in the recent years, it has also became easier to use work of other people without the appropriate citation. Fortunately, recent developments in the area of detecting document overlap (and in general, discovery of similar documents), can also make it easier to discover the plagiarized work. The algorithms for discovering similar documents have also other uses, especially in the area of full-text search engines: either for removing duplicate documents altogether, or for preventing a subset of important but similar documents to occupy the whole first page of the search results. This proposed Ph.D. thesis will evaluate the approaches for the discovery of similar documents, especially by detecting document overlap, and verify which of them are suitable for large sets of documents. It will also focus on aspects of practical implementation on a distributed cluster of standalone computers, and usage in a production environment of the Masaryk University Information System. ER -
KASPRZAK, Jan. \textit{Systems for Discovering Similar Documents}. Brno, 2010, 20 pp.
|