Helsinki Digital Humanities Hackathon #DHH22 will have four thematics areas of interest with one or more groups per topic, each with up to eight participants under the auspices of the group leaders.
Which poets’ popularity waxed and waned over the eighteenth century? What makes a poem viral? Is there a rhetorical or stylistic difference between poetry popular at the time but now unknown, and that of established, canonical poets? This group will develop and use natural language processing methods to analyse poetry and verse found in a corpus of eighteenth century printed texts.
One area we may focus on is the question of poetry found in printed pamphlets. For many ordinary readers in the eighteenth century, a typical encounter with the poetic form was not necessarily through an edition or printed collection of poetic works, but likely through a variety of everyday and ephemeral texts, including newspapers, pamphlets, magazines, and printed ballads. Analysing this pamphlet-poetry at scale will help to quantify the extent of verse within this part of the ECCO collection, allow us to trace the progress of particular styles, language use and genres, and finally look at patterns of recirculation of particularly popular or ‘viral’ poetic works.
The dataset used will be Eighteenth Century Collections Online (ECCO), a dataset of over 200 000 volumes, approximately half of everything printed in the century. A suitable corpus of pamphlets to work with will be provided by the organisers, including a group of extracted poems to work with or for comparison. Additionally, there are 34 000 known editions of poetry found in the ESTC, and the metadata for these will also be provided, as well as the full texts for 21 000 that are also in the ECCO.
This group will be suited to students with both computational and humanities backgrounds. Students interested in eighteenth-century poetry, literary history, and the history of the book will be able to further those interests on a unique dataset of eighteenth century verse contextualised within a larger body of early modern texts, and will be introduced to computational methods for studying texts at scale. Those with a computational background will have an opportunity to contribute towards and improve tools for the automatic detection and analysis of poetry, for example state of the art neural topic modelling based on contextualized embeddings.
Possible research questions include:
- The extent and frequency of verse found in the ECCO pamphlets collection
- Methods for reliably detecting and extracting verse from other forms of text e.g prose.
- Methods for detecting circulation and virality/popularity of individual poems
- Clustering and classification of poetry type using language models, neural topic modelling, and image classification techniques.
- Stylistic and semantic differences between different forms and genre, e.g political satire vs. pastoral.
- Finding ‘unknown’ poetry, or works that were regularly reprinted then but are unknown now.
- Understanding poetry printing from the perspective of publishers’ networks
‘Eighteenth-Century Poetry Archive’ <https://www.eighteenthcenturypoetry.org/>
Benedict, Barbara M. “The Paradox of the Anthology: Collecting and Différence in Eighteenth-Century Britain.” New Literary History 34, no. 2 (2003): 231–56.
Benedict, Barbara M. “Publishing and Reading Poetry.” In The Cambridge Companion to Eighteenth-Century Poetry, edited by John E. Sitter, 63–82. Cambridge Companions to Literature. Cambridge, UK ; New York: Cambridge University Press, 2001.
Batt, Jennifer. “Eighteenth-Century Verse Miscellanies: Eighteenth-Century Verse Miscellanies.” Literature Compass 9, no. 6 (June 2012): 394–405. https://doi.org/10.1111/j.1741-4113.2012.00893.x.
Cordell, Ryan, and Abby Mullen. “‘Fugitive Verses’: The Circulation of Poems in Nineteenth-Century American Newspapers.” American Periodicals: A Journal of History & Criticism 27, no. 1 (2017): 29–52. https://muse.jhu.edu/article/652267.
Lorang, Elizabeth, Leen-Kiat Soh, Maanas Varma Datla, and Spencer Kulwicki. “Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collections.” D-Lib Magazine 21, no. 7/8 (July 2015). https://doi.org/10.1045/july2015-lorang.
Mark Algee-Hewitt, Ryan Heuser, Maria Kraxenberger, J. D. Porter, Jonny Sensenbaugh, Justin Tackett. The Stanford Literary Lab Transhistorical Poetry Project Phase II: Metrical Form. In Digital Humanities 2014, Conference Abstracts, EPFL - UNIL, Lausanne, Switzerland, 8-12 July 2014. Alliance of Digital Humanities Organizations (ADHO), 2014. (see https://docs.google.com/presentation/d/1KyCi4s6P1fE4D3SlzlZPnXgPjwZvyv_V...)
Suarez, Michael F. ‘The Production and Consumption of the Eighteenth-Century Poetic Miscellany’, in Books and their Readers in Eighteenth-Century England. Ed. Isabel Rivers. London: Leicester University Press, 2001. Pp.217-251
A line of thought common still today is that social media confines users into “filter bubbles” or “echo chambers”, where they only encounter people who think alike and content that supports their views. Research of social media has, however, shown that ideological unity does not exclude disagreement. Further, encountering opposing points of view can actually end up strengthening existing attitudes rather than lightening them.
This group sets out to study what actually happens when groups espousing conflicting worldviews interact in an online space. Instead of the usual approaches of network and macro-level analyses, the group will focus on finding patterns and commonalities in the micro-level, discussional interactions that happen in online debates. By extracting data from Twitter, not as individual tweets but as complete conversations, we will be able to study for example how group identities are formed and conveyed to other participants through particular language use, or what kinds of rhetorical and structural strategies different groups utilize to support in-group members in the conversation and to deride and push down outsiders.
In the hackathon, we’ll focus on the Canadian “freedom convoy” (started on January 28, 2022) protest that originally targeted COVID-19 vaccine mandates and later morphed into a media event and social movement. Besides actual physical protests, the phenomenon has created a lively debate on different social media platforms, where social imaginaries, worldviews and e.g. conspiratorial themes intertwine with political stances, such as nationalism, populism or religious beliefs.
Students from a wide variety of backgrounds will find things to do in the group. Students with a qualitative methods background will find work in identifying and teasing out the interactions and framings that interest us. From the computational side, there is room for both quantitative analysis of the conversation structures, as well as for data mining, information extraction and natural language processing in extracting the interactions identified through close reading as interesting. Finally, the expertise of all sides is needed for interpreting the results.
Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, Alexander Volfovsky (2018) Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences Sep 2018, 115 (37) 9216-9221; DOI: 10.1073/pnas.1804840115
Sumiala, Johanna, Minttu Tikka, Jukka Huhtamäki ja Katja Valaskivi. 2016. #JeSuisCharlie: Towards a Multi-Method Study of Hybrid Media Events. Media and Communication 4 (4). https://doi.org/10.17645/mac.v4i4.593
Vaccari, C., Valeriani, A., Barberá, P., Jost, J. T., Nagler, J., & Tucker, J. A. (2016). Of Echo Chambers and Contrarian Clubs: Exposure to Political Disagreement Among German and Italian Users of Twitter. Social Media + Society. https://doi.org/10.1177/2056305116664221
Letters – words written on paper, enclosed in an envelope and transported to the recipient – were everywhere in the 19th-century world. Epistolary exchanges are also among the most important research materials, when scholars study canonical persons and major historical events. However, enquiries based on quantitative analysis are absent both from the studies that use letters as sources and from the enquiries to epistolary cultures or letter-writing as a social practice.
This group will conduct research on different corpora of historical epistolary metadata (names and dates of senders/receivers of letters). The material comes from the aggregated collections of German (correspSearch metadataset), Dutch (the CKCC corpus), and Finnish correspondences (e.g. the corpuses of the National Library of Finland, Finnish National Gallery and National Archives). Together these corpora cover epistolary exchanges from the 17th to the late 19th century.
The main humanities questions include, what kind of patterns of epistolary communication can we recognise in our integrated datasets with a combination of quantitative and qualitative methods? Who could write a letter to whom in historical estate societies? Can big epistolary metadata offer a profitable way to study communicative networks and epistolary cultures of past societies? An additional topic to address is to what extent we need to develop specific source or data criticism for the scholarly use of such material. From the computational perspective, the datasets provide an interesting opportunity to study history by applying computational methods and technologies to the data, such as Linked Data, social network analysis, knowledge discovery, and data visualization.
The data, tools and supervision will be provided by members of the project Constellations of Correspondence. The group can both study the already existing LOD corpora (the CKCC corpus, correspSearch) and work with the harmonizing and enrichment of the Finnish material (e.g. regarding occupations and social classes).
The letter metadata consists mainly of person and place names and temporal information which means that specific linguistic skills are not particularly relevant.
Barton, D. & Hall, N. (2000). Letter Writing as a Social Practice. John Benjamins Publishing Company, https://doi.org/10.1075/swll.9.
Hotson, H., & Wallnig, T. (eds.). (2019). Reassembling the Republic of Letters in the Digital Age. Göttingen University Press. https://doi.org/10.17875/gup2019-1146.
Eero Hyvönen, Petri Leskinen, Jouni Tuominen: LetterSampo – Historical Letters on the Semantic Web: A Framework and Its Application to Publishing and Using Epistolary Data
Catherine D'Ignazio and Lauren F. Klein: Introduction: Why Data Science needs Feminism. In Data Feminism. The MIT Press, 2020.
Preiser-Kapeller, J. (2020). Letters and network analysis. A Companion to Byzantine Epistolography, 431–465. Brill. https://doi.org/10.1163/9789004424616_018.
Nanna Bonde Thylstrup,Daniela Agostinho,Annie Ring, Catherine D'Ignazio, Kristin Veel. Uncertain Archives: Critical Keywords for Big Data. The MIT Press 2021.
The group will focus on a comparison of parliamentary debates from a sociological, politological, and computational perspective. The objective will be to learn how to use comparable parliamentary corpora from various European countries that are annotated with rich metadata and linguistic annotations, enabling various analytical directions. The group will take a network analysis perspective on parliament debates to answer questions on the influence of members, the polarisation of groups, and information spreading in parliament. The group will make use of the linguistic annotations, Named Entities, and metadata coded in the ParlaMint data. Additionally, the group will learn to utilise Google Colab (https://colab.research.google.com) and network analysis tools such as Gephi (https://gephi.org) and NetworkX (https://networkx.org) to join computer science and humanities in gaining knowledge on the Networks of Power.
National parliaments are a verified communication channel between the elected political representatives and society members in any democracy. Political decision-making is organised in party groups, committees, and informal networks among members of parliament and civil servants. In the plenary session, we see these networks manifest themselves as speakers represent their respective groups and refer to one another. The degree to which these networks display exceptional polarisation, centralization of parliamentary voices, or an imbalance in the dynamic between government and opposition, is telling of how the principle of parliamentarism is concretely playing out in the different countries. The networks can also be studied from the perspective of gender, party affiliation, and party stability. By comparing the data synchronically and diachronically in a cross-lingual context, we can obtain important insights into transnational characteristics.
The parliamentary corpora will be provided by the CLARIN ERIC ParlaMint project (https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary..., currently available in English, Dutch, Icelandic, Lithuanian, Czech, Italian, Turkish, Danish, Hungarian, French, Latvian, Romanian, and Belgian Dutch/French, Estonian, Croatian, Polish, Slovenian, and Bulgarian). Its goal is to compile a collection of comparable corpora of debates from national parliaments from all over Europe in a harmonised format, covering data from the 2010s and 2020s. The corpora (available via the CLARIN.SI repository, https://www.clarin.si/repository/xmlui/handle/11356/1431) have already been processed linguistically and enriched with metadata, made searchable through popular concordancers for online querying as well as downloadable from the CLARIN repository for independent handling.
Possible topics and tasks for the group are:
Computational tasks can include but are not limited to
- Creating networks from Named Entities and metadata
- Network centrality
- Community detection
- Network visualisations & evaluation
Humanities and social science tasks can include but are not limited to
- Identifying key turning points in the parliamentary schedule
- Carrying out small evaluation tasks
- Utilising linguistic annotation to compare different nodes in the networks
- Contextualising key actors in the data
- Analysing relevant text passages in the data
- Interpreting results from a comparative perspective
Related work / References:
- Erjavec, T., Ogrodniczuk, M., Osenova, P. et al. The ParlaMint corpora of parliamentary proceedings. Lang Resources & Evaluation (2022). https://doi.org/10.1007/s10579-021-09574-0
- Wasserman, Stanley, and Katherine Faust. "Social network analysis: Methods and applications." (1994). https://books.google.si/books?id=CAm2DpIqRUIC
- Marin, Alexandra, and Barry Wellman. "Social network analysis: An introduction." The SAGE handbook of social network analysis 11 (2011): 25
- W. Selinger, Parliamentarism: From Burke to Weber, 1 ed., Cambridge University Press, 2019. URL: https://www.cambridge.org/core/product/identifier/9781108585330/type/book. doi:10.1017/9781108585330.