On ground truths and biases: A pragmatist take on the morality of machine learning design and application

When one documents the manufacture of machine learning (or artificial intelligence) algorithms using the analytical genre of laboratory ethnography – among other possible ones – one notices that many of them rely upon referential databases called “ground truths” that gather sets of input-data and their manually designed output-targets counterparts. One also quickly realizes that the collective processes leading to the definition of these ground truths heavily impact on the nature of the algorithms they help constitute, evaluate, and compare. In this talk, I will first discuss some of the whys and wherefores of these ground-truthing processes with an emphasis on supervised and unsupervised learning for computer vision. Then, building upon the presented elements and the concept of "genuine option" developed by pragmatist philosopher William James, I will critically discuss the notion of bias and propose an alternative way to consider the morality of machine learning algorithms.

Florian Jaton is a sociologist of science, technology and computing. He is the author of the book The Constitution of Algorithms: Ground-Truthing, Programming, Formulating, published by MIT Press.

The politics of data reuse: entangled ecologies in machine learning infrastructures

Data reuse has become a crucial infrastructural arrangement for deep learning algorithms. At the same time, however, data reuse architectures have been shown to be problematic, entailing, for instance, algorithmic bias and exploitation of data subjects.

This talk argues for a broader engagement with the politics of reuse in machine learning systems. Existing public and scholarly debate surrounding reuse predominantly focus on the data itself: which data is used, how is it reused, and for what purposes. In contrast, this paper examines reuse as a relational unfolding between data, models and more-than-human environments. Developing the notion of reuse entanglements and drawing on empirical examples from infrastructural arrangements for deep learning algorithms in Danish and British public sector, this talk bring the concept and practice of data reuse into conversation with Karen Barad’s work on entanglement and Sara Ahmed’s work on the power relations of use and reuse. In doing so, the talk shifts focus from linear discussions about data input and output (and attendant distinction between use and reuse) and unto a critical examination of the non-linear relationality of (re)use. First, this approach offers crucial insights into why oft-used ethical gestures, such as data deletion, tend to fail as stand-alone solutions for data violence. It secondly opens broader theoretical approaches to the emergent and non-linear nature of deep learning arrangements.


Nanna Bonde Thylstrup is Associate Professor of Communication and Digital Media at Copenhagen Business School with a focus on  knowledge infrastructures; infrastructures of ignorance; environmental media; and digital epistemologies. She is the author of the book The Politics of Mass Digitization published by MIT Press.