We have seen huge growth in the use of voice assistants like Amazon Alexa and Google Home in the past decade. But these devices have a major flaw (beyond their inability to recognise Irish accents). To engage with these devices, we have to use a wake word- a command that lets the agent know that we are looking to start conversing. After this, the types of interactions we have with them are limited to a few turns of dialogue and a request being fulfilled. But what if these agents could start conversations with us? This type of proactive agent leads to a wealth of opportunities, from an agent being able to collaborate with you and your team in a meeting, to being able to inform you about the status of your automated drive and seamless transition to asking you about the in-car entertainment. What would this be like? And what do we need to do to stop this being a voice version of the infamous Clippy?
Our research focuses on the need to get initiation of these proactive agent interactions right, so that they minimise distraction and user annoyance. Our recent work takes inspiration from two major concepts in social science. The first is from cognitive science and focuses on identifying the best time to interrupt a task. Research tells us that there are opportune moments to interrupt, termed breakpoints, that make the interruption less distracting. These often occur naturally when you finish a part of a task such as when you have just finished reading this sentence. Interrupting people at these breakpoints is thought to make it easier for them to return to what they were doing, and thus is more likely to suit the person who was interrupted. For agents to be more proactive, they need to know where these breakpoints are in our everyday routines and how to identify them.
Secondly we take inspiration from human dialogue interaction, investigating how people interrupt other busy people. Imagine you are in a car and you are looking to get the attention of the driver. How would you do it? What would you say? How would you say it? If what you had to say was urgent, what would you do differently? Questions like these can inform how we should design agents to do the same thing. In social science, the patterns of behaviour we use to engage others in conversation are called access rituals. These are the regular “hellos” or “do you have a moment?”, hand waves and facial expressions we use to initiate conversations throughout the day. Our research aims to better understand these behaviours so as to determine what access rituals might be appropriate for voice-based agents to use to proactively engage with us while we are doing another task.
In order to make a proactive agent something that a user would actually want to engage with, it is crucial that we ensure these agents know when to initiate but also how to initiate and what to say to get our attention without annoying us. Only then will we be able to shift towards agents being a proactive partner, rather than the sleepy helper on the desk that we need to wake up.
Dr Benjamin R Cowan is Associate Professor at UCD’s School of Information & Communication Studies. He completed his undergraduate studies in Psychology & Business Studies (2006) as well as his PhD in Usability Engineering (2011) at the University of Edinburgh. His research lies at the juncture between psychology, human-computer interaction and communication systems in investigating how design impacts aspects of user behaviour in social, collaborative and communicative technology interactions. His recent research has focused specifically on how theory from social science can be applied to understand and design speech and language technologies. Prof. Cowan is the co-founder and co-director of the [email protected] group, one of the largest HCI groups in Ireland. He is also the co-founder of the ACM In-CoOperation Conversation User Interfaces (CUI- https://www.conversationaluserinterfaces.org/) conference series, which attracts leading academic and industry research in the field of conversational interfaces. He is also Co-Principal Investigator in the SFI funded ADAPT Centre.
Justin Edwards is a PhD candidate at University College Dublin and a member of the ADAPT Centre. He completed his undergraduate studies in Psychology & Cognitive Science (2017) at Williams College. His research examines speech in multitasking environments, examining how people speak to people who are busy conducting other tasks, and applying these insights to the design of system-initiated dialogue with conversational agents. His work has been presented at CHI, CUI and MobileHCI and published in Interacting with Computers. He also hosts a podcast about computational creativity called Robots on Typewriters, with work on this topic being published twice at CUI.
Links to our recent research:
Multitasking with Alexa- Published in Proceedings of CUI 2019- https://arxiv.org/abs/1907.01925
Eliciting Spoken Interruptions to Inform Proactive Speech Agent Design – CUI 2021 https://arxiv.org/abs/2106.02077