Amazon has been issued a patent on an Alexa technology that has the ability to determine certain physical and emotional characteristics of users based on their voice input, and offer help – sometimes in the form of offers for items for sale. While the patent doesn’t mean Amazon plans to launch products with this technology, it does reflect Amazon’s thinking on Alexa’s potential.
The patent, titled “Voice-based determination of physical and emotional characteristics of users,” is number 10,096,319. The patent covers Alexa’s ability to infer certain traits about Alexa users from their voice when determining how to respond:
Traits may include physical characteristics of a user (e.g., gender, age, ethnic origin, etc.), a physical condition or state of a user (e.g., sore throat, sickness, etc.), an emotional condition or state of a user (e.g., happy, sad, tired, sleepy, excited, etc.), and other traits.
Amazon provides an illustration in the application showing what it intends to cover. In the illustration, a woman is shown to be coughing and sniffling and tells Alexa she’s hungry. The Alexa system is able to determine her “abnormal physical or emotional condition,” and asks if she would like a chicken soup recipe, which she declines. At that point, Alexa takes the initiative to offer her another remedy and says, “By the way, would you like to order cough drops with 1 hour delivery?” When the woman accepts Alexa’s offer, Alexa confirms and concludes by saying, “Feel better!”
Digging deeper into the patent reveals some interesting details. Here are some highlights based on our reading of the patent:
1. Amazon has patented the technology to determine demographic characteristics of users from their voice — including gender, age, and ethnic origin:
For example, voice features may include a gender of the user, an age or age range of the user, an ethnic origin or language accent of the user, an emotion of the user, a background noise of the environment in which the user is located, and other voice features. As a result, content presented at a device may be specific to the user that is using the device (e.g., providing a voice input, etc.), as opposed to a user associated with the device, such as an owner of the device.
2. Amazon has patented the technology to determine physical characteristics of users from their voice — including certain health conditions:
In another example, a second voice processing or signal processing algorithm may be used to process or analyze the voice data to determine a health condition or status of the user. Detectable or determinable health conditions may include, among others, default or normal, sore throat, cold, thyroid issues, sleepiness, and other health conditions. Example algorithms may analyze breath sounds of the user based at least in part on the voice data and may use a cepstral feature set using SVMs and/or neural networks.
3. Amazon has patented the technology to determine emotional status of users from their voice — including joy, fear, and stress:
The first voice processing algorithm may be used to determine an emotional state of the user. Detectable or determinable emotions may include, among others, default or normal, happiness, joy, anger, sorrow, sadness, fear, disgust, boredom, stress, and other emotional states. Emotional states or conditions may be determined based at least in part on an analysis of pitch, pulse, voicing, jittering, and/or harmonicity of a user’s voice, as determined from processing of the voice data…
If it is determined that the user has an abnormal emotional state, the device or a connected computer may select a real-time emotional state of the user. The real-time emotional state of the user may be, for example, at least one of the happiness, joy, anger, sorrow, sadness, fear, disgust, boredom, stress, or other emotional states.
4. Amazon has patented the technology to target ads to users based on what it determines is their current physical and/or emotional condition:
A current physical and/or emotional condition of the user may facilitate the ability to provide highly targeted audio content, such as audio advertisements or promotions, to the user… For example, certain content, such as content related to cough drops or flu medicine, may be targeted towards users who have sore throats… In the example of FIG. 1, the cough drops manufacturer may have targeted users with sore throats for the promotional offer that was presented to the user 130. The targeting criteria for the promotional offer, or the offer generally, may include users with sore throats or users likely to have sore throats…
Audio content targeted to sleepy and bored users may be determined based at least in part on a data tag that identifies the voice data as a sleepy and bored user. For example, a musician may want to target an audio ad for his new album to users with “boredom” and “sleepy” conditions. Audio content for presentation may be selected from the candidate content and presented to the user. For example, the voice interaction device may audibly present “here’s a joke [ . . . ] By the way, this singer just released his new album for just $1.99. Do you want to preview it?” The user may respond affirmatively or negatively as desired.
Again, technology companies file patents often and do not end up launching products with the patented technology. However, the fact that Amazon filed this patent reflects the potential use cases Amazon sees in Alexa’s future.
In our view, these patented technologies raise some important ethical and philosophical questions that Amazon will likely need to take a clear stance on if it intends to launch products including these features. For example, what bounds should be placed on advertising based on these factors? In which scenarios is taking these factors into account at all when determining a response unethical? Would Amazon ever make such data available to Skill developers through Alexa APIs? What are the privacy implications of these systems knowing what a user’s home life is like?
At the same time, these technologies reflect the potential for “smart assistants” like Alexa to become much more emotionally intelligent and empathetic, creating better user experiences. For example, if Alexa could determine a user was in a happy mood when asking Alexa to “play music,” Alexa could automatically infer the user’s emotional state when choosing which songs to play.
Voice interfaces create opportunities for ambient computing to become deeply integrated in our physical spaces, and this patent addresses some important aspects of what that could mean for the future of human-computer interaction and commerce.
Follow TJI as we continue to track the development of the Alexa ecosystem and what it means for hardware developers, software developers, and customers.