Exploring patterns of self-identification in the LGBTQ+ Reddit Corpus
Computer-mediated communication has played a significant role in shaping the current discourses on gender and sexuality by bringing together often dispersed minorities and providing an anonymous space to consider questions related to identities. To explore the link between these recent social developments and language practices, we investigate linguistic constructions of self-identification among sexual and gender minorities on the discussion forum Reddit.
Through analysis of lexico-grammatical patterns, we investigate in what ways and to what extent linguistic constructions of self-identification such as identify as X, be X and as a X are employed in online discourse. Our preliminary findings suggest that such constructions are productive as rhetorical means for claiming a specific identity and positioning oneself in discourse (e.g., I identify as non-binary most days). At the same time, these constructions are often used for labelling others, together with meta-discussion on the appropriate demarcation of these categories.
Utilizing the Pushshift repository, we have compiled The Reddit LGBTQ+ Corpus (c. 44 million words), which includes texts from various LGBTQ+ subforums on Reddit (e.g., r/lgbt, r/nonbinary). The corpus covers the period from 2010 to 2021, containing approximately 600 submissions per month with subsequent comments. In the presentation, we describe the corpus and its compilation, and present our preliminary analysis, discussing and contextualizing online self-identification practices within the broader discourse on gender and sexuality.
Minna Palander-Collin (PI) is Professor of English Language and currently Vice-Dean for Academic Affairs at the Faculty of Arts, University of Helsinki. Since 2009, she has been PI of several funded research projects focusing on changing language practices and societal change in the history of English. Her most recent project deals with Democratization, Mediatization and Language Practices (DEMLANG). Her main research interests include historical sociolinguistics, historical pragmatics, language change, corpus linguistics, and she is one of the compilers of the Corpus of Early English Correspondence. She is a member of the Finnish Academy of Science and Letters.
Turo Hiltunen, PhD, Docent, is University Lecturer at the Department of Languages, University of Helsinki, where he teaches corpus linguistics and other digital approaches to the study of English. His main interests are corpus linguistics, grammar, phraseology and register analysis. Hiltunen has extensive experience of corpus development and has worked in such corpus projects as Early and Late Modern English Medical Writing (Benjamins 2011, 2019). He has (co-)authored over 30 studies, and edited research volumes and special issues, most recently for John Benjamins Publishing Company and the journals Language Sciences and Journal of English Linguistics.
Laura Hekanaho, PhD, is currently a Postdoctoral Researcher at the University of Jyväskylä. Her PhD dissertation Generic and Non-binary Pronouns (University of Helsinki, 2020), which was accepted with distinction, investigated attitudes towards generic and nonbinary 3rd person singular pronouns, including an exploration of how nonbinary pronouns are employed in identity building. She received a PhD award for feminist research. Her main research interests include language and gender research, identity and language, mixed methods research, statistical modelling, corpus linguistics and qualitative analysis.
Helmiina Hotti has a BA in Linguistics (Language Technology) from the University of Helsinki. She is currently working on her MA thesis on Language Technology in the MA programme in Linguistic Diversity and Digital Humanities, University of Helsinki.