HelRAW: Robert Crellin 3.10.2022

The Helsinki Research on the Ancient World (HelRAW) is a monthly research seminar. HelRAW is organized by the SpaceLaw project together with the Digital Grammar of Greek Documentary Papyri (PapyGreek) project.

3.10.2022 at 17.15 (UTC+3)

Room 18, Metsätalo (Unioninkatu 40, 4th floor)

Robert Crellin (University of Oxford, Crossreads project): D(ominus) or D(ecimus)? Using context to measure the ambiguity of Latin abbreviations in epigraphic texts


A characteristic feature of Latin inscriptions from antiquity is their use of abbreviations. Words of various classes may be abbreviated, including:

  • Names, e.g. L(ucius), C(aius)
  • Nouns, e.g. IMP(erator)
  • Verbs, e.g. D(edicavit), D(edit)
  • Adjectives, e.g. NN (= nostri)

This feature distinguishes Latin inscriptions from contemporary Greek inscriptions, whose use of abbreviations is much more sparing (Cooley 2012: 357; McLean 2002: 49; Gordon 1983).

Much research on abbreviations in the ancient, especially Roman world, has tended to focus on their use as potential dating markers. Such an application is in principle possible because of the changing use of abbrevations over time (Cooley 2012: 359; for an example, see e.g. Salomies (2014: 157–158)). Latin abbreviations have also (rarely) been studied in their own right (Hälvä-Nyberg 1988; Gordon 1948), although more recently abbrevations have tended to feature as lists in more general manuals of epigraphy (Lassère 2005; Limentani 1968).

The advent of digital technologies has provided the possibility of considerably expanding the coverage and recording of abbreviations. Thus the size of Tom Elliott’s inventory of abbreviations (https://paregorios.org/resources/abbrev/, last accessed 13th September 2022) considerably exceeds that of book-published alternatives (Cooley 2012: 357). One characteristic of Latin abbreviations that emerges clearly from Tom Elliot’s work is the shere number of possible expansions of certain abbreviations: Elliot lists no fewer than 189 possible expansions for the abbreviation D, for example.

The very large number of possible expansions of certain abbreviations immediately raises the question of the basis on which contemporary readers themselves expanded the abbreviations. It seems reasonable to suppose that contextual variables — where ‘context’ is construed in both narrow and broad terms — is fundamental to this task.

In this paper I harness the 81,883 Latin inscriptions in the Epigraphic Database Heidelberg (EDH) corpus (https://edh.ub.uni-heidelberg.de/) to provide a preliminary assessment of the degree to which it is possible to expand Latin abbreviations on the basis of their context. In this preliminary study, both the words surrounding a given abbreviation (n-grams) and the type of object on which the abbreviation is written (e.g. ‘stele’) are incorporated into a Machine Learning model. The model is used to make predictions on separate test datasets, and the accuracy of these predictions is measured. I give an assessment of the preliminary results, looking at the method’s strengths and weaknesses as currently implemented, and suggest avenues for further development.


Robert Crellin is a historical linguist whose work focuses on the syntax and semantics of ancient languages, the structure of ancient writing systems, and computation approaches to language. He completed his PhD in Classics at Cambridge in 2012, where he wrote on the syntax and semantics of the perfect in Ancient Greek, especially its postclassical varieties. Robert’s subsequent research has encompassed analyses of Biblical Greek, including the translation of the Greek verb system in the early versions of the New Testament and the morphology of personal names.

Between 2014 and 2016 Robert worked on the Greek Lexicon Project in Cambridge, now published as The Cambridge Greek Lexicon, where he was responsible for writing articles on prepositions. Most recently Robert has been employed on the ERC project Contexts of and Relations between Early Writing Systems. His work here has focused on the writing of vowels in Northwest Semitic writing systems, especially Punic, and on word division. Robert has just published a monograph which analyses the semantics of word division in Northwest Semitic writing systems, including Ugaritic, Phoenician, Hebrew and Greek. The particular focus of this work is the syntax-phonology-graphematics interface, investigating the relationship between the written ‘word’ and the morphosyntactic and phonological words, adopting a variety of syntactic, phonological and computational frameworks.

In Jonathan Prag’s Crossreads project he is conducting inter alia syntactic analysis and markup of texts in the I.Sicily corpus.