Generative AI and the end of corpus-assisted data-driven
learning? Not so fast!

J 2023

Generative AI and the end of corpus-assisted data-driven learning? Not so fast!

CROSTHWAITE, Peter a Vít BAISA

Základní údaje

Originální název

Generative AI and the end of corpus-assisted data-driven learning? Not so fast!

Autoři

CROSTHWAITE, Peter a Vít BAISA

Vydání

Applied Corpus Linguistics, Amsterdam, Elsevier, 2023, 2666-7991

Další údaje

Jazyk

angličtina

Typ výsledku

Článek v odborném periodiku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Nizozemské království

Utajení

není předmětem státního či obchodního tajemství

Odkazy

URL

Označené pro přenos do RIV

Ano

Kód RIV

RIV/00216224:14330/23:00139271

Organizační jednotka

Fakulta informatiky

Klíčová slova anglicky

Data-driven learning; generative AI; ChatGPT; DDL; Corpora

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 3. 4. 2025 22:35, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

This article explores the potential advantages of corpora over generative artificial intelligence (GenAI) in understanding language patterns and usage, while also acknowledging the potential of GenAI to address some of the main shortcomings of corpus-based data-driven learning (DDL). One of the main advantages of corpora is that we know exactly the domain of texts from which the corpus data is derived, something that we cannot track from current large language models underlying applications like ChatGPT. We know the texts that make up large general corpora such as BNC2014 and BAWE, and can even extract full texts from these corpora if needed. Corpora also allow for more nuanced analysis of language patterns, including the statistics behind multi-word units and collocations, which can be difficult for GenAI to handle. However, it is important to note that GenAI has its own strengths in advancing our understanding of language-in-use that corpora, to date, have struggled with. We therefore argue that by combining corpus and GenAI approaches, language learners can gain a more comprehensive understanding of how language works in different contexts than is currently possible using only a single approach.

Citovat

CROSTHWAITE, Peter a Vít BAISA. Generative AI and the end of corpus-assisted data-driven learning? Not so fast! Applied Corpus Linguistics. Amsterdam: Elsevier, 2023, roč. 3, č. 3, s. 1-4. ISSN 2666-7991. Dostupné z: https://doi.org/10.1016/j.acorp.2023.100066.

@article{2488800,
   author = {Crosthwaite, Peter and Baisa, Vít},
   article_location = {Amsterdam},
   article_number = {3},
   doi = {https://doi.org/10.1016/j.acorp.2023.100066},
   keywords = {Data-driven learning; generative AI; ChatGPT; DDL; Corpora},
   language = {eng},
   issn = {2666-7991},
   journal = {Applied Corpus Linguistics},
   title = {Generative AI and the end of corpus-assisted data-driven learning? Not so fast!},
   url = {http://dx.doi.org/10.1016/j.acorp.2023.100066},
   volume = {3},
   year = {2023}
}

TY  - JOUR
ID  - 2488800
AU  - Crosthwaite, Peter - Baisa, Vít
PY  - 2023
TI  - Generative AI and the end of corpus-assisted data-driven learning? Not so fast!
JF  - Applied Corpus Linguistics
VL  - 3
IS  - 3
SP  - 1-4
EP  - 1-4
PB  - Elsevier
SN  - 26667991
KW  - Data-driven learning
KW  - generative AI
KW  - ChatGPT
KW  - DDL
KW  - Corpora
UR  - http://dx.doi.org/10.1016/j.acorp.2023.100066
N2  - This article explores the potential advantages of corpora over generative artificial intelligence (GenAI) in understanding language patterns and usage, while also acknowledging the potential of GenAI to address some of the main shortcomings of corpus-based data-driven learning (DDL). One of the main advantages of corpora is that we know exactly the domain of texts from which the corpus data is derived, something that we cannot track from current large language models underlying applications like ChatGPT. We know the texts that make up large general corpora such as BNC2014 and BAWE, and can even extract full texts from these corpora if needed. Corpora also allow for more nuanced analysis of language patterns, including the statistics behind multi-word units and collocations, which can be difficult for GenAI to handle. However, it is important to note that GenAI has its own strengths in advancing our understanding of language-in-use that corpora, to date, have struggled with. We therefore argue that by combining corpus and GenAI approaches, language learners can gain a more comprehensive understanding of how language works in different contexts than is currently possible using only a single approach.
ER  -

CROSTHWAITE, Peter a Vít BAISA. Generative AI and the end of corpus-assisted data-driven learning? Not so fast! \textit{Applied Corpus Linguistics}. Amsterdam: Elsevier, 2023, roč.~3, č.~3, s.~1-4. ISSN~2666-7991. Dostupné z: https://doi.org/10.1016/j.acorp.2023.100066.

Přehled o publikaci