Rohit Prasad, head scientist of the Alexa artificial intelligence group at Amazon, caused a stir this week when he suggested that, for Alexa to be truly smart, it needs eyes and a body.
Speaking at EmTech Digital, an artificial intelligence conference organized by MIT Technology Review, Prasad talked of the difficulties with creating an assistant like Alexa which can understand the context of language, something which is simple for humans but tough for even the smartest computers.
"The only way to make smart assistants really smart is to give it eyes and let it explore the world," Prasad said, without going into detail about what he and the Alexa division at Amazon is working on.
Even if the details are missing, the need to make Alexa smarter is a primary concern for Amazon. Not that Alexa is particularly stupid - it is broadly considered an equal with Google Assistant, while Apple's Siri, Microsoft's Cortana and Samsung's Bixby trail in their wake.
With so many of Amazon's Echo devices sold — some 100 million worldwide since 2014 — the company must ensure the Alexa assistant in these devices keeps learning. But, if Prasad's prediction is correct, Alexa will need physical improvements to become truly smart.
What exactly could this entail? We'll take Prasad's mention of eyes to mean cameras. Now, the Echo Spot and Echo Show (plus the lesser-seen Echo Look) all have cameras, but these are primarily used for video chatting with other Echo owners, or taking photos and analyzing your outfit in the case of the Look.
The next step would be for Alexa to use this video input to better understand its surroundings. This could mean speaking more loudly when it sees a room full of people, or whispering if Alexa sees a sleeping child, or a nearby crib. Alexa could also use cameras to read the emotions of the people it interacts with.
Amazon is already working on this to some extent. Back in 2016 it was reported by company insiders that Amazon wanted Alexa to understand emotional tenor in human voices, which could lead to a better comprehension of intent when asked a question or told to do something.
An Amazon patent filed in 2017 and made public the following year describes a way for Alexa to monitor people's emotions, then respond in a matter best suited to how they feel. The patent discusses a future where Amazon would recognize "happiness, joy, anger, sorrow, sadness, fear, disgust, boredom, stress" and respond accordingly. Amazon being a retailer, this could mean offering to buy something to help alleviate these mostly-negative emotions. On a similar note, Amazon has also patented a system where Alexa detects when a person sound unwell, then offers to buy them medication to help. Being able to see, as well as hear, would surely help every aspect of this feature set.
At a simpler lever, Alexa with eyes could help users identify objects, or buy replacements when something has just run out. Perhaps one could hold up an empty cereal box in sight of an Amazon Echo, then ask Alexa to buy some more, or recommend an alternative based on the shopping habits of other Amazon customers.
But what about Prasad's thoughts of an Alexa able to "explore the world"? It's very easy to jump to conclusions, via the 2013 "Black Mirror" episode Be Right Back, where a grieving girlfriend orders a robot which looks and acts like her partner after he died in a road accident. She begins by uploading their text message history to a service which turns the conversations into a simple chatbot - not dissimilar to what James Vlahos did with his father's conversations in 2017. Craving more of her boyfriend back, she upgrades to an AI which can speak out-loud, before finally the robot is delivered to her home.
The final stage is, of course, a very long way away — despite what those behind Sophia the robot might claim — but rewinding a step and taking a different route isn't beyond the realms of reality. Alexa already lives inside the Anki Vector robot, complete with cute Pixar-inspired animations, motorized wheels, intelligence, and a keen desire to explore its environment.
The Vector doesn't quite deliver the full picture, as its Alexa abilities are not woven fully into the Anki-built Vector persona. But if they were, the result would be a robot which senses when you arrive home, turns to you, recognizes you with its camera, then adjusts the lighting and heating to your personal preferences, switches on your preferred radio station, and interacts with other smart home devices. It may even notice you look tired, or stressed, or happy and act accordingly. Or that you have company, so instead opts to keep quiet until your guest has departed.
Scale this up a little and you approach the customer service robots LG has been building for the past couple of years, driving on the floor instead of a tabletop like Vector, and able to carry objects. A couple more logical steps from there, and we have robots capable of bringing you a beer from the fridge, or help carry heavy objects across the garden. Switch to enterprise, and the robot could cart materials around a construction site, all the while using cameras and Alexa's intelligence to make sense of everything around it, and interact with human colleagues.
So yes, suggesting Alexa will one day have its own humanoid body may sound improbable — not to mention creepy — but Prasad's dream of an Alexa with the ability to explore isn't that far away at all. This doesn't confirm the next Echo devices will have wheels and unblinking, all-seeing eyes, but means of understanding and exploring its environment feel like the logical next steps for Amazon to take.