Please register by Sunday 10th of February, so we can order coffee and refreshments: https://goo.gl/forms/llnsYZIbxfybYSZi2. Attendance is free.
The workshop will take place on Feb 18/02/2019 from 13:00 to 17:00 in Sali 6 (3rd floor), which is located in Metsätalo, Unioninkatu 40. Map: https://goo.gl/maps/nrrb2t3XK582
Contact: Simon Hengchen, email@example.com
Moderators: Jani Marjanen, Antti Kanner and Simon Hengchen
Languages change with time. While the study of meaning is not a new thing, the past few years have seen a lot of attention given to the computational tackling of particular tasks within (historical) semantics. Advances in NLP and the availability of massive textual corpora have made possible new research methods focused on lexical semantic change, using a range of approaches varying from topic models to neural word embeddings. Knowing what a word means at a particular moment of time is crucial for text-based research in the humanities. While current computational methods present solutions for some contexts (typically recent English, with clean data), this growing community lacks an extensive overview of existing work on the one hand, and an interdisciplinary discussion between major parties and fields interested in the topic on the other. In all works, the evaluation of results and the comparison between approaches is close to impossible: the lack of a standardised definition of what lexical semantic change is both in general terms and with regard to different corpora and time spans, as well as the inadequacy (or absence) of current evaluation frameworks, makes reproducing methods in other contexts very difficult.
In this workshop, which takes place in the context of the DH research seminar series at the University of Helsinki, we wish to foster discussions between NLP researchers, (digital) historians, and historical linguists. In that sense, it echoes the workshop on automatic detection of language change 2018, co-located with SLTC. A particular focus of these talks will be the need for proper evaluation frameworks for the study of semantic change.
Tahmasebi, N., Borin, L., Jatowt, A. (2018). Survey of Computational Approaches to Diachronic Conceptual Change. Under review for Computational Linguistics. https://arxiv.org/abs/1811.06278
Dr Nina Tahmasebi is a researcher in Natural Language Processing at the University of Gothenburg, Sweden. She obtained a Ph.D. in Computer Science from L3S Research Center at the University of Hanover, Germany, in 2013. Her main research interest lies in automatic detection of diachronic language change, in particular word sense change, but she is interested in information extraction and change detection in general. She has done work in social media analysis, sentiment mining, summarization, and text mining for the digital humanities.
In this talk I will give an overview of the work done in computational detection of semantic change over the past decade. I will present both lexical replacements and semantic change, and the impact these have on research in e.g., digital humanities. I will talk about the challenges of detecting as well as evaluating lexical semantic change, and our new project connecting computational work with high-quality studies in historical linguistics.
Dominik Schlechtweg is a PhD student at the University of Stuttgart. He hold a BA in Linguistics and English as well as an MSc in Computational Linguistics from the same university. He also spent half a year at the ILLC (University of Amsterdam) where he studied logic. Before entering the PhD programme in 2017, Dominik taught formal semantics at the Institute of Linguistics at the University of Stuttgart. His PhD topic is titled 'Distributional Models of Semantic Change' and is supervised by Prof. Sabine Schulte im Walde.
A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficiality and lack of comparison in evaluation of models of diachronic lexical change by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of sense divergences in the field of term extraction.
Dr Haim Dubossarsky completed his PhD at the Hebrew University of Jerusalem under the supervision of Prof. Daphna Weinshall (CS department) and Dr Eitan Grossman (Linguistics department). Though he obtained training in psycholinguistics and computational neuroscience, Dr Dubosarsky devoted his doctoral training to the study of computational linguistics, and particularly the field of semantic change. Building on his multidisciplinary skills, his work made both scientific and methodological contributions to the field which were published in top tier venues. Dr Dubossarsky is currently supported by the Blavatnik foundation to carry out his post-doctoral research at the Language & Technology Lab in Cambridge, headed by Prof. Anna Korhonen, where he studies the intricacies between linguistic typology and NLP models, and their potential at improving both models’ quality and broadening linguistic typology understanding.
What are word embeddings hiding up in their sleeves?
Distributional models of word embeddings (e.g., predictive models like word2vec or co-occurrence counts models like PPMI) have become prevalent in NLP studies and related research. The use of these models has also been established in usage-based linguistics, ranging from studies of polysemy (Schütze 1998; Heylen et al. 2015) and language variation (Jenset et al. 2018) to semantic change (Dubossarsky et al. 2015; Perek 2015).
In light of the popularity of these models within NLP and their diffusion beyond NLP, it is noteworthy that recent studies have reported deficiencies that make the word embeddings created by these models noisy (Hellrich & Hahn 2016; Antoniak & Mimno 2017; Dubossarsky, Grossman & Weinshall 2017; Karjus et al. 2018; Dubossarsky, Grossman & Weinshall 2018). These studies might seem of little relevance for linguists who want to use word embeddings as off-the-shelf models to study linguistic phenomena. However, contrary to this naïve impression, it has been shown that these deficiencies may drastically bias the analysis of the linguistic phenomena studied, and as a consequence, may lead researchers to unsound conclusions (Dubossarsky, Grossman & Weinshall 2017).
Our studies focus on the role word frequency and sampling have on the accuracy of word embeddings, and show how these two have far-reaching consequences for the study of semantic change (Dubossarsky, Grossman & Weinshall 2017) and polysemy research (Dubossarsky, Grossman & Weinshall 2018). In addition to both empirical and theoretical analyses, we propose a general method that allows the continued use of word embeddings by mitigating their deficiencies through carefully crafted control condition.
Antoniak, M. & D. Mimno. 2017. Evaluating the Stability of Embedding-based Word Similarities. TACL 6, 107–119.
Dubossarsky, H., E. Grossman & D. Weinshall. 2017. Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models. Proceedings of EMNLP, 1147–1156.
Dubossarsky, H., E. Grossman & D. Weinshall. 2018. Coming to Your Senses: on Controls and Evaluation Sets in Polysemy Research. Proceedings of EMNLP, 1732–1740.
Dubossarsky, H., Y. Tsvetkov, C. Dyer & E. Grossman.2015. A bottom up approach to category mapping and meaning change. Proceedings of the NetWordS, 66-70.
Hellrich, J. & U. Hahn. 2016. Bad Company — Neighborhoods in Neural Embedding Spaces Considered Harmful. Proceedings of COLING-16, 2785–2796.
Heylen, K., T. Wielfaert, D. Speelman & D. Geeraerts. 2015. Monitoring polysemy: Word space models as a tool for large-scale lexical semantic analysis. Lingua 157: 153–172.
Jenset, G. B., J. Barðdal, L. Bruno, E. Le Mair, P. A. Kerkhof, S. Kleyner, L. Kulikov & R. Pooth. 2018. Continuous vector space models for variation and change in sparse, richly annotated Indo-European argument structure data. Presentation at SLE 2018, Tallinn.
Karjus, A., R. A. Blythe, S. Kirby & K. Smith. 2018. Two problems and solutions in evolutionary corpus-based language dynamics research. Presentation at SLE 2018, Tallinn.
Perek, F. 2015. Using distributional semantics to study syntactic productivity in diachrony: A case study. Linguistics 54(1): 149-188.
Schütze, H. 1998. Automatic Word Sense Discrimination. Computational Linguistics 24(1): 97–123.