How formulaic are inquisition records? Measuring lexical
richness and text similarity in a corpus of Latin notarial
documents

k 2024

How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents

ZBÍRAL, David; Gideon KOTZÉ a Robert Laurence John SHAW

Základní údaje

Originální název

How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents

Autoři

ZBÍRAL, David; Gideon KOTZÉ a Robert Laurence John SHAW

Vydání

Formulaic Language in Historical Research and Data Extraction, 7-9 February 2024, Huygens Institute for the History and Culture of the Netherlands & Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands, 2024

Další údaje

Jazyk

angličtina

Typ výsledku

Prezentace na konferencích

Obor

60203 Linguistics

Stát vydavatele

Nizozemské království

Utajení

není předmětem státního či obchodního tajemství

Odkazy

conference program

Označené pro přenos do RIV

Organizační jednotka

Filozofická fakulta

Klíčová slova česky

formulační jazyk; lingvistická analýza; středověké notářské listiny

Klíčová slova anglicky

formulaic language; linguistic analysis; medieval notarial documents

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 8. 2. 2025 20:23, Mgr. Ivona Vrzalová

Anotace

V originále

It is a widely accepted axiom that medieval inquisition records, just as many other types of notarial documents, are formulaic. However, the only notion of different degrees of formulaicity is that some registers – such as the register of Jacques Fournier famously studied by Emmanuel Le Roy Ladurie – are “less formulaic”, and thus “exceptional”. This exceptionality has even become, deservedly or not, an indication of reliability, which invests formulaicity with critical importance. It is thus surprising that there exist no empirical studies which would actually measure the formulaicity of medieval inquisition records, thus allowing us to systematically compare between them and inform the source criticism of this contested type of text. To bridge this gap, we apply methods of lexical richness measurement and text similarity analysis to an expertly cleaned corpus of digitized editions of Latin-language medieval inquisition records (ca. 1,300,000 tokens). This allows us to express that the formulaicity of inquisition records, rather than a universal feature with anecdotic exceptions, is actually a distribution on a scale, where some registers are significantly more formulaic than others. We achieve this by investigating the distribution, diversity, and similarity of types, tokens, as well as larger segments of text, combining this with our knowledge of the texts in order to interpret the results. Besides comparing individual registers with one another, we are able to compare the degree of formulaicity between different genres of heresy trial records (such as the formulaicity of depositions vs. sentences vs. abjurations), since a part of our corpus (ca. 700,000 tokens) is segmented into specific documents provided with genre metadata.

Návaznosti

101000442, interní kód MU

Název: Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe (Akronym: DISSINET)

Investor: Evropská unie, Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe, ERC (Excellent Science)

Citovat

ZBÍRAL, David; Gideon KOTZÉ a Robert Laurence John SHAW. How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents. In Formulaic Language in Historical Research and Data Extraction, 7-9 February 2024, Huygens Institute for the History and Culture of the Netherlands & Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands. 2024.

@proceedings{2463881,
   author = {Zbíral, David and Kotzé, Gideon and Shaw, Robert Laurence John},
   booktitle = {Formulaic Language in Historical Research and Data Extraction, 7-9 February 2024, Huygens Institute for the History and Culture of the Netherlands & Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands},
   keywords = {formulaic language; linguistic analysis; medieval notarial documents},
   language = {eng},
   title = {How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents},
   url = {https://republic.huygens.knaw.nl/wp-content/uploads/2024/01/Formulaic_Language_Final-1.pdf},
   year = {2024}
}

TY  - CONF
ID  - 2463881
AU  - Zbíral, David - Kotzé, Gideon - Shaw, Robert Laurence John
PY  - 2024
TI  - How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents
KW  - formulaic language
KW  - linguistic analysis
KW  - medieval notarial documents
UR  - https://republic.huygens.knaw.nl/wp-content/uploads/2024/01/Formulaic_Language_Final-1.pdf
N2  - It is a widely accepted axiom that medieval inquisition records, just as many other types of notarial documents, are formulaic. However, the only notion of different degrees of formulaicity is that some registers – such as the register of Jacques Fournier famously studied by Emmanuel Le Roy Ladurie – are “less formulaic”, and thus “exceptional”. This exceptionality has even become, deservedly or not, an indication of reliability, which invests formulaicity with critical importance. It is thus surprising that there exist no empirical studies which would actually measure the formulaicity of medieval inquisition records, thus allowing us to systematically compare between them and inform the source criticism of this contested type of text. To bridge this gap, we apply methods of lexical richness measurement and text similarity analysis to an expertly cleaned corpus of digitized editions of Latin-language medieval inquisition records (ca. 1,300,000 tokens). This allows us to express that the formulaicity of inquisition records, rather than a universal feature with anecdotic exceptions, is actually a distribution on a scale, where some registers are significantly more formulaic than others. We achieve this by investigating the distribution, diversity, and similarity of types, tokens, as well as larger segments of text, combining this with our knowledge of the texts in order to interpret the results. Besides comparing individual registers with one another, we are able to compare the degree of formulaicity between different genres of heresy trial records (such as the formulaicity of depositions vs. sentences vs. abjurations), since a part of our corpus (ca. 700,000 tokens) is segmented into specific documents provided with genre metadata.
ER  -

ZBÍRAL, David; Gideon KOTZÉ a Robert Laurence John SHAW. How formulaic are inquisition records? Measuring lexical richness and text similarity in a corpus of Latin notarial documents. In \textit{Formulaic Language in Historical Research and Data Extraction, 7-9 February 2024, Huygens Institute for the History and Culture of the Netherlands \&{} Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands}. 2024.

Přehled o publikaci