Alexa, add pineapple to my basket! by Anna Terés

Automatic Summary

Welcome to Hands-Free Shopping Conversations with Alexa

Thank you for joining us for an exciting exploration into the world of hands-free shopping with Alexa, proudly brought to you by Ocado. Today, we will delve into how this major UK online supermarket implements voice technology for smoother ecommerce solutions. We will also unveil how voice assistants like Alexa are invaluable for ecommerce.

Meet Ana Terre: The Voice Behind Voice Integration

First, let's meet our guide on this fascinating journey. Ana Terre has been a sparkling gem in Ocado Technology Barcelona for almost six years. She began her journey as a software engineer and now leads the voice team. They are responsible for managing the voice integration of our ecommerce platform. A staunch Heavy metal lover and Spartan researcher, Ana showcases her passion for technology in every task.

Ocado: Harnessing the Power of Technology for Enhanced Customer Experience

Born more than 20 years ago, Ocado emerged as an online supermarket with no physical stores. Remarkably, all the technology, both hardware, and software was created from scratch. This significant milestone laid the groundwork for what we know today as the Ocado Smart Platform. It's an integrated platform for grocery ecommerce sold to our partners and covers a complete shopping experience, right from browsing products to fulfillment and delivery.

How Voice Assistants Enhance Ecommerce Experience

If you've ever cooked and suddenly realized you're out of an ingredient, you'll understand the hassle of stopping everything to add this item to your online shopping list. Enter voice assistants like Alexa. With a simple command as you're continuing with your cooking, Alexa adds the items for you to your basket, so you never interrupt your vital actions. Even while you are driving and remember, you need to shop for something, you can simply ask Alexa to add it on the go!

The Inner Workings of the Alexa-Ocado Integration

Understanding Alexa’s Basic Concepts

Let's dive into some basic Alexa concepts that facilitate this seamless integration. Firstly, when a customer interacts with Alexa, they use a specific phrase known as an utterance. From this utterance, two critical items can be extracted: the intent and the slots. The intent explains what the customer wants to do. For instance, adding something into their basket. Slots are variables within the utterances, such as the specific product the customer wants to add to their order, or their Ocado basket/order - the destination for their order. Consequently, when a customer says "add pineapple to my basket", Alexa translates that into the intent "add to container", understanding that the search term is "pineapple" and the container is "the basket".

Behind the Scenes

Amazon's Alexa service employs Automatic Speech Recognition and Natural Language Understanding to comprehend and convert the customer’s intent into text. Amazon sends us this intent, search term, and the container, along with an access token that identifies the customer. This information is fed into our System Request Handler, which navigates the conversational nodes until it sends the appropriate response.

The Detailed Process

The Ocado System Request Handler executes the identified intent to add to the container in the Graph Runner. The Graph Runner uses different responses based on the confidence level we have in that the customer wants to add the given product. If we're not too sure, we suggest a product and wait for the customer's confirmation. However, if the confidence is high, we directly add the product to the basket and inform the customer accordingly. The aim always is to ensure customer interactions are as short as possible for efficiency.

This seamless integration between Alexa and Ocado provides an enriching customer experience, offering a platform that allows them to effortlessly make and modify digital grocery lists, check order status and totals, or even enquire about available products, among other services.

Customer-Centric, Innovative Solution

This Alexa-Ocado integration is much more than a helpful tool to make shopping easier. It serves as a beacon of technological excellence that aims to redefine convenience, eliminate boundaries, and make life easier for the modern consumer. It's the perfect blend of innovation and customer satisfaction – a futuristic approach that reimagines ecommerce.

If you are interested in more of what we do at Ocado, see our career offerings [here].


Video Transcription

Thanks for coming and welcome to Alexa at final to my basket or how do we in Ocado handle hands-free shopping conversations with Alexa. First of all, let me introduce myself. I am Ana Terre and I've been in Ocado Technology Barcelona for almost six years now.I started as a software engineer and I'm currently leading the voice team which takes care of the voice integration for our Ecommerce platform. I do love heavy metal and I am a Spartan research. Evangelist Ocado was born more than 20 years ago as an only uh online supermarket without physical stores. All the technology, both hardware and software was developed internally and was the seed to the OCA O smart platform that today we sell to our partners as an integrated platform for grocery ecommerce. The OS P platform covers the complete shopping experience for ecommerce to the fulfillment process and delivery. Today, I'm going to talk about how voice assistants can be useful for ecommerce. I'll introduce you some basic Alexa concepts and then I will ask Alexa to add something to my basket.

So you can see how the conversation is handled both from the Alexa side and from Ocado side 20 years ago when the first mobiles came, I was one of those haters that said that I would never use one of those things. I didn't like it. I didn't need it and I was never ever going to play games on it today. I cannot do anything without it. All the aspects of my life are somehow related with the mobile calendar, email, social and work activities, travel and day to day pictures. Sometimes I use it even for making calls today with uh VCA assistance. I started having the same reaction. I don't want to use it. But my husband is obsessed with that and my home is dominated by Alexa. Uh music, lights, temperature TV, blinds, animal sounds. Who does not need animal sounds at home? Right. But Juliana, why is this useful for ecommerce? Imagine that you are preparing your Friday night pineapple pizza? Imagine that your hands are covered in flour and you finish all the pineapple. Oh my goodness. I need to add pineapple to my next order before I forget. And then next week, I won't be able to make my infamous pineapple pizza again, but my hands are dirty and I don't want to stop doing what I'm doing. Pizza is an art that shouldn't be interrupted. Alexa is the solution. Ask Alexa to add whatever you run out and then you can forget it and continue with your life. This can happen also when you are driving your hands are busy and then you remember.

Oh, I need to buy some beers for my next Friday pizza night. Don't be forgetting it. Just ask Alexa to add some beers to your order. Oh, sorry. Think about the countless moments during your day when your hands are busy and you would benefit from someone hitting you out, changing nappies, cleaning the kitchen, building a Lego Star Fighter fighter. You can use it also when your hands are available. But you prefer to have a more natural shopping experience as when you go to the market and you ask a real person for what you need. This is especially useful for all people who usually see technology as a barrier with Alexa. They just ask what they need and Alexa does the rest. This also applies to people with visual impairments. They really appreciate being able to do the shopping without having to interact with the screen devices. Today. I'm going to show you how to add pineapple to your basket, using the Ocado skill, the the Alexa skill. Sorry. But before that, let's learn some basics, basic concepts of Alexa. When a customer asks something to Alexa, they may, they may say something like Alexa add pineapple to my basket. This is what is known as an utterance. A specific phrase that people will use to make a request to Alexa from that utterance.

Uh We can extract the intent which represents the action that the customer wants to do, for example, adding something to my basket. And then within that uh Terance, uh we may have different slots which are different variables that can represent different things like the first term or the container. The first time in our case represents the product, the product the customer wants to buy and the container represents their Ocado basket or order which will be the will be the destination of that product. So from our example, when our customer says add pineapple to my basket, what Alexa needs to determine is that the search term is pineapple that the container is the basket and that what the customer wants to do is add to container. So Alexa, why don't you help me to show them how this is done in Ocado? So let me change. Um Let me change my screen so I can show you. I won't, I won't do it with an Alexa device. Uh But with the Alexa console, which is a simulator for the Alexa. Can you see the screen now? Let's simulate that. I am opening the Alexa Skin Open Ocado.

Good afternoon. How can Ocado help

add pineapple to my basket? We are simulating the conversation.

Do you fancy a card of pineapple at £1 please? Sure thing I've added it to your Ricardo trolley. Do you need help with? Ok. Ok. Have a good.

So now I'm going back to my presentation which is uh here. Yes, perfect. Thank you Alexa for helping me. So after this interaction, what what happened behind the scenes? So taking a look to how the worker works, you will see that after the customer says add pineapple to my basket is the Alexa service. The one performing the automatic speech recognition and the natural language understanding. The automatic speech recognition is a technology that converts spoken words into text. Alexa detect those sounds and recognize them as words. The natural language understanding also happens on the Amazon site using the words coming from the automatic speech recognition. Alexa then uses what the speaker actually means.

The model that Amazon uses to perform. The NLU is created by us using tons of sample utterances and slot values. We have a different model per retailer because each retailer will have a different catalog and we want to be very accurate in our suggestions. Also different language imply different models. So once automatic speech recognition and natural language understanding are performed, Amazon sends us an intent, a certain and a container together with an access token so we can identify the customer. This information is sent to our service called a system request handler that handles the intent and calls the OS P services to find out what we need to know about the users, their orders and also perform the search of the product in the catalog. All that information is then stored in what we call a dialogue data, which is basically the context of the conversation of the conversation which will be navigating our conversational notes until it gets a response we send, we send that together with the intent to our graph runner, which is where all the magic happens.

Now, let's take a closer look at how the conversation is handled on our side in the O AO side. When the customer ask Alexa to add pineapple to my basket, the Alexa service performs A S A SL at NLU and send us this JSON file. The one that you are seeing in the in the screen where we can get the access token, the intent, the slots, the container search among other stuff. The intent has been identified to be add to container. So a system was handled will execute the to contained intent in the graph runner. Now we are going to explore step by step. What happens when we feed our graph with an intent, a certain and a container to get an appropriate answer. You need to think that we don't have any visual support to help get the customers to navigate through the multiple features. A standard website or app can offer. We need to rely on the conversational creativeness of the customers. They can ask anything in multiple ways and sometimes they will try to do things that we are not able to do. So, one of the traits of our conversational engine is to be as educative as possible. So for example, when we identify something that we cannot do, we will return a fixed response to explain that we are not able to do that.

But when there is another way of doing what they ask, we respond back with an educative answer to help the customer understand how to do what they want to do. Let's see an example. Let me go back to the Development Council. Mm mm Perfect. No,

how can help?

This is a an intent that we no support at the moment uh because we haven't developed it yet.

Sorry, I can't have multiple items yet. Try saying add two more after adding an item to your basket. Do you need help with anything else? Ok,

perfect. So let's go back to the presentation. Nice. So as you can see, the first step in our conversational graph is to check if the customer has ordered more than one element to give the appropriate answer in case they wanted to add more than one of the same product. But in our current example, add just one pineapple to my basket. We can skip that tutorial response and continue with the normal flow, right? So we know that the customer wants a pineapple. So we now perform the search of the pineapple in our os P back and get the search results back. There can be hundreds of pineapples. Oh, sorry, we can uh we also need uh sorry, we also need to get the container, the customer wanted to add their products to the basket or the order that will have different consequences later on. Since there can be hundreds of pineapples, we need to suggest just the one the customer wants another, not the one on promotion, not the one we would like them to buy. Just the one they really want. How can we do that?

We use machine learning to classify each search result into a confidence level and then we order them based on those, we order the results based on the confidence we have that they want, they want to add that product. The level of confidence will determine the type of response we are going to be. If we're not very confident, we will suggest a product and wait for the uh for the customer's confirmation. On the other hand, if the confidence is high, we will, we will add directly to the container and we will just inform the customer what we have done. The aim here is to have the shortest conversation possible as we have data that support the fact that customers prefer shorter interactions with the vocal assistance. So now we continue with our flow thinking that we are not extremely sure about that pineapple the customer wants.

So our response will be of the type. How about? And then before really adding the pineapple to the container, we need to check if there is another pineapple of the same time already in the container. So we can warn the customer in case they forgot, they already added the product to their order or maybe they truly want another of the same. We also need to verify if the, if the product is out of stock. In which case, we cannot add it to the container. And finally, we can add the pineapples to the container. And all that is how the response after that the response goes back to the customer. And remember that was only for answering back, add pineapple to my basket. There are other interactions that that can happen while adding the pineapple or just after adding it, if the pineapple suggested or added is not the one the customer decides, decides they can say suggest another. After adding the pineapple, we can add two more, we can remove a product from the container and we can also after adding the pineapple remove using only the it remove it because we have the context. Alexa comes with the screen devices too. So we need also to support them and expand the voice experience to screen devices where customers are able to choose by, by touching the product on the screen or just saying at option one, we aim to do less things that a mobile app or a website can do, but we aim to do them very accurately.

There are many other things the skill can do at the moment like where is my order? What's in my order? Do I have pineapple in my order? What's my order total. When can I edit my order until you do you sell pineapple? Show me the pineapple. This glass is only for screen devices, but there are still many things we can do and we have a road map full of challenges supported by us and data that help us to find the right features to implement next. So thank you for listening. And if you have any questions, I'm more than happy to try to answer them. If, if you are interested in Ocado, I have just shared the career link. What do you mean by privacy issues? Patrice? Ah I know what you mean. Yes, that's something that um is Amazon dealing with. Yeah. So sometimes you can say we, we receive utterances that say fuck off Alexa or something like that. Yes, but that's something that we cannot relate to the to the customer and we just skip the those uh terences and we don't store anything. How do you how security the data between the PC of the customer and your servers? Do you mean in a in the Alexa scale or you know? Uh No, exactly. Is Oao the one doing that? OK. O sorry, it Amazon is the one doing the N LP and all the natural languages processing is done in Amazon. So it's their responsibility.

We think that we only receive the intent, the first term and the container, right? We don't receive the whole uter only the ones that uh Amazon extracts and says that are meaningful, not the whole sentence. Thank you. Thank you for attending. Bye bye.