J 2023

Generative AI and the end of corpus-assisted data-driven learning? Not so fast!

CROSTHWAITE, Peter a Vít BAISA

Základní údaje

Originální název

Generative AI and the end of corpus-assisted data-driven learning? Not so fast!

Autoři

CROSTHWAITE, Peter a Vít BAISA

Vydání

Applied Corpus Linguistics, Amsterdam, Elsevier, 2023, 2666-7991

Další údaje

Jazyk

angličtina

Typ výsledku

Článek v odborném periodiku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Nizozemské království

Utajení

není předmětem státního či obchodního tajemství

Odkazy

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/23:00139271

Organizační jednotka

Fakulta informatiky

EID Scopus

Klíčová slova anglicky

Data-driven learning; generative AI; ChatGPT; DDL; Corpora

Příznaky

Mezinárodní význam, Recenzováno
Změněno: 3. 4. 2025 22:35, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

This article explores the potential advantages of corpora over generative artificial intelligence (GenAI) in understanding language patterns and usage, while also acknowledging the potential of GenAI to address some of the main shortcomings of corpus-based data-driven learning (DDL). One of the main advantages of corpora is that we know exactly the domain of texts from which the corpus data is derived, something that we cannot track from current large language models underlying applications like ChatGPT. We know the texts that make up large general corpora such as BNC2014 and BAWE, and can even extract full texts from these corpora if needed. Corpora also allow for more nuanced analysis of language patterns, including the statistics behind multi-word units and collocations, which can be difficult for GenAI to handle. However, it is important to note that GenAI has its own strengths in advancing our understanding of language-in-use that corpora, to date, have struggled with. We therefore argue that by combining corpus and GenAI approaches, language learners can gain a more comprehensive understanding of how language works in different contexts than is currently possible using only a single approach.