Voice UX, the intelligence behind Machine Learning

In the world of Artificial Intelligence (AI), Robotics and voice activated devices, one of the challenges we face is how to encourage brand engagement or facilitate valuable information exchange with a Wi-Fi enabled speaker in the corner of your living room or the built-in voicebox of a lifelike robot.

Don’t get confused with all the terminology — let’s strip this all back to basics, it’s still just Input and Output.

Input, traditionally completed by a keyboard or mouse is changing. Touchscreens (iPhones, iPads etc) first appeared revolutionising input; they combined input and output into one tool or device. Now voice has appeared with the likes of Alexa, Google Home, Siri, Cortana etc. At present a robot is nothing more than a voice/touchscreen activated input device, with a voice/display output. Output will inevitably progress as dexterity, tangibility and tactility become more mainstream, but right now Robocopand Johnny 5 remain fictional — just.

The real change happening right now is around input — voice. It doesn’t use a structured format tool such as a keyboard or trackable click/press event from a mouse/touchscreen. This means that designers and developers can’t control the input using traditional methods of validation (easy example — try typing in your Date of Birth using characters instead of numbers into a DoB field on an online form). When completing a ‘what’s your postal address’ form online with a keyboard/touchscreen it’s easy and takes seconds, even if you get interrupted by a colleague. With voice, whilst completing a form and your phone rings, you could have a whole conversation between the Town and Postcode fields that the voice input device (Alexa, for example) has no idea if you are talking to it or your colleague on the phone. But everything could be recorded and attempted to be inserted as valid input into the form.

Solving this effectively simple problem is where Voice UX comes in. You could teach your voice application to accept only certain criteria when completing forms or even redesigning forms to be much more interactive (why not use your Amazon address book when using Alexa). However, the user will need to know what the limitations are or how to interact with the application using only voice. Understanding how we can counter this to make the experience of filling out the form as effortless as possible, we have to apply intuition, understanding, familiarity and empathy. This will only come with user research-led design.

But what about brand engagement? Brands are going to want minimal input returning maximum output and will accept a wide criteria of input to get the customer to the content. But a user does not ask to become a customer or brand champion — their positive experiences ‘create’ the advocate and experience with voice is no different. Getting to this point and designing an experience of value will only be achieved, perhaps more so in this channel than any other, by applying empathy and intuition established through research-led UCD.

Don’t get fooled by terminology like Machine Learning, that’s not what people outside of MIT, Google, Apple, Microsoft and the likes get to do. Machine Learning (or teaching) for everyone else, will be limited to commercially-biased supervised learning, effectively feeding your AI application the criteria that your Voice UX project has established.

Establishing these criteria is where UX time and budget is spent. It needs strategic thinking, it needs research and design, and just like UX Design now, it underpins the end product or service.