Sylvie Delacroix: Taking Turing by Surprise? Designing Autonomous Systems for Morally-Loaded Contexts

There is much to learn from Turing’s translating Lady Lovelace’s ‘objection’ - ‘computers cannot originate anything new’ - into a question about surprises. Surprises, and the model changes they often generate, are the object of renewed interest in the Machine Learning literature: given the tendency for Bayesian model certainty to come close to zero as the number of data samples increases, optimizing the learning performance of systems within dynamic environments requires that systems be made to look for so-called ‘black-swan events’ in a bid to preserve an adequate degree of model plasticity. For humans this plasticity is at least as desirable, and typically compromised by the weight of habits, rather than statistical certainty. Shaking off the former is a more painful experience than questioning the latter. This paper aims to throw light on the asymmetry in the mechanisms underlying model change in humans v. machines for two reasons. First because this asymmetry has important implications when it comes to the challenges inherent in designing autonomous systems meant for morally loaded contexts. Current efforts to ethically ‘train’ such systems pay little attention -if any- to the difficulties that stem from the unavoidable need for change in those systems’ moral stances. To appreciate the significance of these difficulties, one needs to understand both the distinct role of habit reversal in the mechanisms underlying moral change in humans and the fact that machines are unlikely to ever experience habit reversal in that way. Once one takes on board the implications of this asymmetry, it becomes clear that, no matter how ‘ethically aligned’ they may have been initially, such systems are likely to ‘leap morally away’ - most likely past a point of mutual intelligibility - thus calling into question their desirability.

Keywords: Machine Learning, Turing, autonomous systems, decision-support systems, ethics, surprise, habit, value-alignment problem, Lady Lovelace

