Voice-first devices and services are definitely different than conventional mobile and desktop apps, so they come with a unique set of terms. Here we cover terms invented by the current voice-first leader, Amazon’s Alexa.
An Alexa device is nothing without apps. Except, in Alexa terms, we call these skills, not apps. The main reason for that is that there also exists a single Alexa App, which is the control app for the Alexa device and its skills. Calling those apps too would clearly lead to confusion.
An Alexa device, like Echo or Echo Dot, can contain as many skills as you want. Over ninety-percent of these are custom skills. Smart home and flash briefing skills also exist, but those are very specialized so we won’t be getting into those in this article.
There’s one more component in the Alexa platform, which is an Alexa service. This is sometimes referred to as an AWS Lambda function. Whatever you call it, it is the back-end code, running in the cloud, which supports an Alexa skill.
When someone uses an Alexa skill, the requests to the device and its responses make up what’s referred to as an Alexa interaction. Skill interactions are governed by an interaction model that maps out all the requests, responses, actions and data that the skill uses to serve its user.
The way you start a skill is through its invocation name. This is simply the skill’s name, such as “Daily Affirmation.” Users say only the invocation name to get things rolling or they may combine the name with an intent so an action is also performed: “Alexa, tell me my Daily Affirmation.”
What are intents? Each intent corresponds to a specific action in the Alexa skill. Think of them as equivalent to menu items in a computer program such as “Copy,” “Open” or “Exit.”
Each intent in the skill contains the following components:
- The name of the intent
- A set of utterances
- Optional slots
Utterances are the core of any skill. These are all the ways a user might phrase a request for a particular action or intent. Since the same requests could be spoken in English in many different ways, there are typically a large number of utterances per intent. Slots help extract specific data types from an utterance such as the time, the date, a location, a name and so on.
There are just a couple more nuances to intents to cover: full and partial intents. A full intent means the request contains everything needed to complete the action, whereas a partial intent requires more information to fulfill the task. The skill gets this additional information by issuing a prompt to the user, which is usually another question or a list of options.
Most skill conversations are short and directed, but Alexa also supports dialogs. These are useful for skills requiring a lot of information such as one that diagnoses what’s wrong with your car.
Dialogs are supported by the Alexa SDK by allowing for a dialog model and through the use of Alexa confirmations, which come in two flavors: explicit confirmations and implicit confirmations.
Explicit confirmations simply echo back the user’s last request as a question. Implicit confirmations are more subtle. These use a landmark word, taken from the user’s request to indicate understanding of the request:
“Alexa, What are my birthday reminders?”
“Your birthday reminders for which month?”
Implicit confirmations contribute a more natural feel to Alexa conversations.
Voice-first apps or skills have limitations of course. For one thing, the context of the conversation is completely auditory, which is pretty limiting if you want to, say, view a short film via a skill. Fortunately, Amazon supplements skills via the Alexa App and the use of skill-specific cards.
These are holders for data that is impossible or difficult to impart through voice only. For instance, a details card gives the user instructions on using the Alexa skill. Home cards let users see non-voice data such as images, video or long passages of text.
Alexa can also issue notifications to users. These are usually reminders the user set for upcoming events or a timer. However, Amazon also added opt-in push notifications from skills. However, their behavior is totally under control of the user so they do not interrupt current skills.
Stay Tuned for Changes
Though some of the terms Amazon chose might be a bit techie sounding, overall they are pretty accessible to everyday humans. Of course, terms will change with continued advances in voice-first devices by both Amazon and competitors, but for now you are well-equipped to discuss Alexa’s version of voice-first technology with confidence.