Link Search Menu Expand Document

Working group «Lemmes»

This website is dedicated to the activities of the “Lemmes” working group of the COSME² consortium (Consortium Sources Médiévales 2).


One of the major changes in medieval history research, brought about by the increasing digitization of medieval texts, is the possibility of working on very large corpora, applying statistical methods and structured semantic analysis. By postulating the relationship between the meaning of words and historical changes, following the example of Jost Trier’s (1894-1970) theory of semantic fields, the historian is able to reconstruct the logic of a social system of representation manifested by the selection of vocables used and related to each other.

In the case of languages with declension and high orthographic variation, such as those used in the medieval West (Latin and vernacular languages), any ambition to develop formalized, computer-assisted search procedures implies the lemmatization of the corpora used, i.e., the grouping of the frequencies of the different forms of a word under their lemma. In recent years, several European teams have been working on the creation of lemmatizers (for example, at the University of Frankfurt, the eHumanities Desktop (access by request), which offers a lemmatized mediolatin corpus). In France, two sets of parameters specific to medieval texts, based on the software TreeTagger, were released simultaneously in 2009, with encouraging results that are still being perfected: the tokenizer and the medieval Latin parameter set developed by the team of ANR OMNIA (Outils et Méthodes Numériques pour l’Interrogation et l’Analyse des textes médiolatins) (dir. A. Guerreau - IRHT, EnC, Artehis) and the lemmatizer for Middle English, Middle French and Latin, designed by the project team of PALM (Plateforme d’analyse linguistique médiévale) (access by request) (dir. J.- Ph. Genet - Lamop).

In the medium term, one of the aims of producers of digitized corpora should be to make their already lemmatized text sets freely available to researchers. In addition to refining existing tools, which we must continue to support, we also need to make the various teams aware of the importance of this approach for the renewal of research, integrate it into new projects and deploy it in existing corpora and databases. This is no mean feat, given that, while not incompatible, the possibilities offered by automated processing are quite distinct from the usual (i.e. “manual”) methods used in historical research, which find themselves in competition with each other. It’s a situation, after all, that the sociology of science has correctly diagnosed as part of the processes leading to a shift in the scientific paradigm (T. Kuhn).

Project leader

Eliana Magnani (LAMOP-UMR 8589)


References

  • Eliana Magnani & Nicolas Perreaux, « A Medieval Epigraphic Corpus and its Retro-Developments (CIFM-CBMA). The Exploratory Research of the COSME2 Consortium », DSH: The Journal of Digital Scholarship in the Humanities, special Issue proceedings of DH2019 conference, dir. Elena Pierazzo, Fabio Ciotti, 2020. DOI: 10.1093/llc/fqaa069. HAL Id: halshs-03085017

  • Estelle Ingrand-Varenne, Eliana Magnani, « Le corpus épigraphique bourguignon (VIIIe-XVe siècle). Des catalogues aux applications numériques », Bulletin du centre d’études médiévales d’Auxerre, BUCEMA, Collection CBMA, Les journées d’études, mis en ligne le 15 novembre 2018, consulté le 06 décembre 2018. DOI: 10.4000/cem.15591. HAL Id: halshs-01946701

  • Eliana Magnani, « Lemmes : un groupe de travail sur les outils de lemmatisation et les corpus de textes médiévaux lemmatisés », Archivum Latinitatis Medii Aevi - ALMA, 76, 2018 (impr. 2019), p. 340-344. HAL Id: halshs-02429433

  • Eliana Magnani, « Les nouveaux corpus CBMA : hagiographie, épigraphie, alia. Bilan et perspectives (2017-2020) », Bulletin du centre d’études médiévales d’Auxerre, BUCEMA, Collection CBMA, Les journées d’études, mis en ligne le 20 mai 2020, consulté le 20 mai 2020. DOI: 10.4000/cem.17087. HAL Id: halshs-02698177

  • Aurore Menudier, « Le corpus épigraphique provençal : premier bilan et comparaison avec le corpus bourguignon », Bulletin du centre d’études médiévales d’Auxerre, BUCEMA, Collection CBMA, mis en ligne le 19 mai 2020, consulté le 25 janvier 2023. DOI: 10.4000/cem.17076.

Associated teams

LAMOP (UMR8589), Université de Paris 1 CESCM (UMR 7302), Université de Poitiers École nationale des chartes (CJM, EA 3624) IRHT Lexicon Mediae et Infimae Latinitatis Polonorum (Académie des sciences de Pologne) Goethe-Universität LASLA - LASLA (Université de Liège) (ULiège)