Autumn 2021 update: Helsinki Computational History Group research

If you are interested in cutting-edge computational history work on early-modern data, you’ve come to the right place! Here is an update what has been going on lately in our group.

We like to keep it busy in computational history as you’ll see! As part of our strategy, we have different lines of research being developed with respect to bibliographical data science. On ESTC data (that we have been working on for years), a new paper on Probabilistic Analysis of Early Modern British Book Prices was just published led by our new doctoral student Iiro Tiihonen together with Leo Lahti. Also other work focusing on library catalogues is ongoing and collaboration with our international partners continues on bibliodata. As few examples, we have an article on The Representativeness of Eighteenth-Century Collections Online coming out soon in Eighteenth-Century Studies with Lahti and Eetu Mäkelä and we just published in ECS an article with Mark Hill on A Computational Investigation into the Authorship of Sister Peg. And, we also submitted a paper on Book Printing in Latin and Vernacular Languages in Northern Europe, 1500–1800 with Jani Marjanen and Tuuli Tahko developing further our earlier work on changes in regional language profiles in early modern Europe. Work on NewsEye and other work on newspaper data continues, of course as well!

In this update I’d like to put the main focus though on another line of our research that you might have not heard so much about: text reuse data based on ECCO and EEBO-tcp. Our strategy for several years has been to combine work on metadata and full text collections focusing particularly on early modern British data and the era of the Enlightenment. BLAST (based on code developed in TurkuNLP group led by Filip Ginter and implemented largely by Aleksi Vesanto on Finnish newspaper sources) has been turned into algorithms that function with respect to the ECCO and EEBO-tcp data by our doctoral student Ville Vaara. We were led to this path earlier when working on a consortium project led by Hannu Salmi where the Turku group had an interest in virality of news that still continues. We got our initial results out already in 2018 with a paper on Hume’s History of England (yet to be published). The aspect of studying early modern British books and pamphlets compared to Finnish newspapers is very different and the code has taken a long time to optimize. Finally this year we have been able to move forward with full steam developing also other cases on text reuse phenomena and early modern intertextuality. This work has also benefited greatly by Octavo-environment orchestrated by Mäkelä.

As few examples, we led a hackathon group working on Pierre Bayle’s Dictionary in the British context with respect to reuse data. That undertaking from last spring is currently worked into an article with the hackathon group which is exciting. We also have currently a project course at the University of Helsinki on digital humanities and English together with Tanja Säily ongoing where three groups are working with this text reuse data to study the quotations, authors and translations of Lucretius’s De Rerum Natura in the early-modern British context. And, a very gifted MA student is also working on the text reuse data as part of bibliographic data science approach on the Spectator and eighteenth-century canon making processes. All of this ties in with our Academy of Finland funded project RISE OF COMMERCIAL SOCIETY AND EIGHTEENTH-CENTURY PUBLISHING (RiCEP), 2020–2024 that we lead with Säily.

About the future, one long-term aim of the text reuse is to use it in a forthcoming edition on David Hume’s History of England for Oxford University Press. As a related project we are currently also using this text reuse data with Mark Spencer to study the reception of Hume’s Essays in an article contracted for a volume on Hume for Cambridge University Press. The idea is to contribute also to another volume with respect to the textual borrowings in and from Hume’s Treatise shortly after studying the Essays. I can reveal that a combination of close study of the interlinks between works and a systematic automated approach of text reuse with respect to Hume’s Essays is yielding very interesting results. And, think, as we move forward and iterate our workflows and processes, this approach becomes scalable so you can study any early-modern author in the same way with EEBO-tcp and ECCO data that we are studying for example Hume at the moment! The impact on studies of reception of particular authors at minimum will be considerable. The argument isn't that this enables us to be sure that our data covers "everything" with respect to reception. No. The argument is that we are able to take a systematic approach to one aspect of reception. And, that alone is plenty.

In conjunction to text reuse work (elaborating on the combination of full text sources to metadata in our strategy), we are aiming to take ECCO also more seriously as a full text collection developing BERT models together with our Finnish partners for this noisy dataset. We have a great, aspiring collaboration planned already for the use of HPC for these historical sources with computer scientists from Helsinki, Turku and Aalto. Let’s see if those who have the power to make executive decisions understand the value of this truly multidisciplinary work (without funding it is difficult to do any new larger developments in a field that is far from established and where globally very few researchers have been able to put in the time and effort to organise themselves in actual research groups that is the only way that enables a systematic, long-term approach combining traditional fields in the humanities and data science. Unfortunately, the tendency is to overlook this and to make unrealistic, overshooting plans that end up mainly playing with lots of different datasets instead of aiming for systematic focus that leads you to publish in venues that have an actual impact on traditional fields of research). Nevertheless, the wheels of our research are in motion and you should be hearing about our first initial results with respect to training language models on historical, noisy data of ECCO and EEBO-tcp soon.

As a final note, we have room for one postdoc in our group. The application for this position is open until 11.11.2021. Read more and apply, or at least spread the word to those who you think should be part of our group:

Mikko Tolonen, PI of Helsinki Computational History Group