Today during its annual September hardware event, which was held virtually for the first time, Amazon announced updates across its portfolio of Alexa developer tools and frameworks. Those arrived alongside a slew of new Alexa features including Reading Sidekick, which lets Alexa read books with kids. Also announced was Alexa Voice Profiles for Kids, which automatically recognizes a kid’s voice and switches to a kid-friendly mode, and improvements to Alexa’s conversational and home monitoring capabilities.
The pandemic has supercharged voice platform usage, which was already on an upswing. According to a study by NPR and Edison Research, the percentage of voice-enabled device owners who use commands at least once a day rose between the beginning of 2020 and the start of April. Just over a third of smart speaker owners say they listen to more music, entertainment, and news from their devices than they did before, and owners report requesting an average of 10.8 tasks per week from their assistant this year compared with 9.4 different tasks in 2019.
Starting in the coming weeks, Amazon says that Alexa will ask questions of users to help the assistant better understand what they mean. Alexa will be able to remember, for instance, that “Dad’s reading mode” means to set the living room lights to 60% brightness and switch on the air conditioning. It’s personalized to individual customers, and Amazon says that it’ll work for smart home concepts and actions to begin with before expanding to other domains.
Alexa will also soon be able to change intonation depending on the context of back-and-forth conversations, building on Amazon’s advances in neural text-to-speech technology. Beginning in the coming months, the assistant will stress certain words and even insert pauses and breaths, according to Alexa VP and head scientist Rohit Prasad.
Natural Turn Taking
Meanwhile, a forthcoming enhancement to Follow-Up Mode, which was introduced back in 2018, will let multiple people join conversations with Alexa without having to use a wake word for every utterance. It’s called Natural Turn Taking — Alexa will leverage acoustic, linguistic, and even visual cues to determine whether a request is directed towards it, Prasad says.
Three AI models run in parallel to power Natural Turn Taking, which will initially only be available in English when it launches sometime next year. One distinguishes background speech and noise from commands intended for Alexa. The second converts speech into text using speech recognition, so that it can be analyzed at the sub-word level. As for the third, it uses the signal from a device camera (if available) to make a decision about whether what’s being spoken is being directed toward the device.
“In the case of [Echo] devices with a camera, the camera can be used to detect the pose as to where you’re looking — whether you’re looking at another person or you’re looking toward the Alexa device,” Prasad told VentureBeat during a phone interview. He noted that Natural Turn Taking builds on Alexa Conversations, a feature that launched in beta earlier this year to provide developers a deep learning-based way to create natural-feeling apps. “The video and speech is processed locally, and then [neural networks] are used to fuse and decide whether or not your speech is intended for Alexa.”
To be clear, Natural Turn Taking doesn’t require devices with a camera — it’ll work on devices without one, too. But it might not support older devices without Amazon’s AZ1 neural edge chip, and Prasad says it’ll be more accurate on devices with cameras.
Sound detection and Alexa Guard
Alexa’s sound detection is expanding as well, with recognition of things like a baby crying, barking dogs, and the sound of snoring. Later this year, customers will be able to choose to set up Routines that can kick off when Alexa detects one of those sounds.
More than 2 million customers have opted into Alexa Guard since launch, Amazon says, and the company expects at least a portion will enroll in Alexa Guard Plus, a new premium offering. For $4.99 a month, Alexa Guard Plus adds detection for the sound of footfalls, doors closing and opening, and more, as well as 24/7 monitoring with access to an emergency hotline.
A complimentary feature called Alexa Care Hub lets customers add “high-level” relationships with family members to get an activity feed that shows when they interact with smart home devices. Amazon pitches it as a way to check in on those with mobility and health issues; Amazon VP of smart home Daniel Rausch says that Alexa is now compatible with 140,000 products and that customers have set up over 100 million devices to work with Alexa.
A new Alexa command lets users quickly delete everything Alexa ever recorded. Saying “Alexa, delete everything I ever said” will remove all voice snippets associated with an Amazon account, which Amazon typically retains to improve the performance of Alexa’s various systems. Beyond that, Alexa now supports group audio and video calling with up to eight friends or family members; Zoom and Amazon Chime calls; and music sharing via Echo devices with the command “Alexa, share this song.”
The new tools and features come on the heels of others launched at Amazon’s Alexa Live event in July. There, the company rolled out deep neural networks aimed at making Alexa natural language understanding more accurate for custom apps, as well as an API that allows the use of web technologies to build gaming apps for select Alexa devices. Amazon also launched Alexa Conversations in beta, a deep learning-based way to help developers create more natural-feeling apps with fewer lines of code. And it debuted a new service in preview — Alexa for Apps — that lets Alexa apps trigger actions like searches within smartphone apps.