2023
Generative AI and the end of corpus-assisted data-driven learning? Not so fast!
CROSTHWAITE, Peter a Vít BAISAZákladní údaje
Originální název
Generative AI and the end of corpus-assisted data-driven learning? Not so fast!
Autoři
CROSTHWAITE, Peter a Vít BAISA
Vydání
Applied Corpus Linguistics, Amsterdam, Elsevier, 2023, 2666-7991
Další údaje
Jazyk
angličtina
Typ výsledku
Článek v odborném periodiku
Obor
10201 Computer sciences, information science, bioinformatics
Stát vydavatele
Nizozemské království
Utajení
není předmětem státního či obchodního tajemství
Odkazy
Označené pro přenos do RIV
Ano
Kód RIV
RIV/00216224:14330/23:00139271
Organizační jednotka
Fakulta informatiky
UT WoS
EID Scopus
Klíčová slova anglicky
Data-driven learning; generative AI; ChatGPT; DDL; Corpora
Příznaky
Mezinárodní význam, Recenzováno
Změněno: 3. 4. 2025 22:35, RNDr. Pavel Šmerk, Ph.D.
Anotace
V originále
This article explores the potential advantages of corpora over generative artificial intelligence (GenAI) in understanding language patterns and usage, while also acknowledging the potential of GenAI to address some of the main shortcomings of corpus-based data-driven learning (DDL). One of the main advantages of corpora is that we know exactly the domain of texts from which the corpus data is derived, something that we cannot track from current large language models underlying applications like ChatGPT. We know the texts that make up large general corpora such as BNC2014 and BAWE, and can even extract full texts from these corpora if needed. Corpora also allow for more nuanced analysis of language patterns, including the statistics behind multi-word units and collocations, which can be difficult for GenAI to handle. However, it is important to note that GenAI has its own strengths in advancing our understanding of language-in-use that corpora, to date, have struggled with. We therefore argue that by combining corpus and GenAI approaches, language learners can gain a more comprehensive understanding of how language works in different contexts than is currently possible using only a single approach.