D 2023

Comparative Analysis of Community Detection and Transformer-Based Approaches for Topic Clustering of Scientific Papers

BRETSKO, Daniel, Aliaksandr BELY a Stanislav SOBOLEVSKY

Základní údaje

Originální název

Comparative Analysis of Community Detection and Transformer-Based Approaches for Topic Clustering of Scientific Papers

Autoři

BRETSKO, Daniel (804 Ukrajina, garant, domácí), Aliaksandr BELY (112 Bělorusko, domácí) a Stanislav SOBOLEVSKY (112 Bělorusko, domácí)

Vydání

Cham, 23rd International Conference on Computational Science and Its Applications , ICCSA 2023, od s. 648-660, 13 s. 2023

Nakladatel

Springer

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Německo

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

Impakt faktor

Impact factor: 0.402 v roce 2005

Kód RIV

RIV/00216224:14310/23:00131468

Organizační jednotka

Přírodovědecká fakulta

ISBN

978-3-031-36804-2

ISSN

UT WoS

001166618800042

Klíčová slova anglicky

Network analysis; NLP; Topic clustering; Community detection; Sentence-transformers

Štítky

Příznaky

Mezinárodní význam, Recenzováno
Změněno: 21. 3. 2024 10:32, Mgr. Marie Šípková, DiS.

Anotace

V originále

We are solving the topic clustering problem, where we need to categorize papers with initially available subjects into more consistent and higher-level topics. We approach the task from two perspectives, one is the traditional network science, where we perform community detection on a subject network with the use of Combo algorithm, and the second is the transformer-based top2vec algorithm which uses sentence-transformer to embed the content of the papers. The comparison between the two approaches was conducted using a dataset of scientific papers on computer science and mathematics collected from the SCOPUS database, and different coherence scores were used as a measure of performance. The results showed that the community detection Combo algorithm was able to achieve a similar coherence score to the transformer-based top2vec. The findings suggest that community detection may be a viable alternative for topic clustering when one has predefined topics, especially when a high coherence score and fast processing time are desired. The paper also discusses the potential advantages and limitations of using Combo for topic clustering and the potential for future work in this area.

Návaznosti

EF16_019/0000822, projekt VaV
Název: Centrum excelence pro kyberkriminalitu, kyberbezpečnost a ochranu kritických informačních infrastruktur
MUNI/J/0008/2021, interní kód MU
Název: Digital City
Investor: Masarykova univerzita, Digital City, MASH JUNIOR - MUNI Award In Science and Humanities JUNIOR