Traditional Software vs Machine Learning Software by Filipa Castro

Automatic Summary

Felipa the Data Scientist: Building Sophisticated and AI-Driven Products by Continental

Hello everybody! This is Felipa, your friendly data scientist from Continental, who is currently nestled in Porto, Portugal. Today, I want to share my experience and insights on creating innovative and intelligent digital products, and discuss the challenges we face as we transition from traditional to machine learning-based software.

Continental: Not Just About Tires

When you think about Continental, the first thing that probably comes to mind is the tire production. However, our activities extend far beyond that. We apply our expertise to a variety of areas including agriculture, sports, and of course, the automotive sector.

Addressing the Oddity: Traditional vs. AI-Driven Products

Developing traditional web or mobile applications is easy to plan and execute because we, as developers, provide explicit instructions to the computer on how it should behave under specific circumstances. However, creating AI-driven products enters another realm — what we call 'software 2.0'.

In this new world, we don't provide explicit instructions but instead, supply a myriad of data or examples, from which the computer learns. It works similarly to educating a child; after showing a number of examples, the child begins to learn, and the same concept applies to our software 2.0.

Data Analysis: The Beginning of A Unique Journey

Before dipping our toes into the development phase, it's imperative to understand the business and the data. During this grey area of understanding, we often spot diverse scenarios within the company that we need to navigate to ensure a seamless journey.

Overcoming obstacles

Different obstacles often arise, especially in the data preparation phase. At times, we need to collect data and set up a new project. Other times, the data quality is not up to par, or the database itself can be challenging to decipher.

Despite such uncertainty and diversity, our team of data scientists act as data experts and consultants, eager to foster data literacy and support any team wanting to develop AI-based solutions.

Case Study 1: Tire Inspection System

To provide a solid understanding, let's examine a real-life use case where we were tasked with creating a tire inspection system. Our goal was to use images of tires and detect any anomalies. For each image, we desired to ascertain whether it was in a good state or if it had sustained any damage. We also aimed to reduce the false detection of anomalies.

The key here was to thoroughly understand the data and business. We broke the problem down and focused on the most critical areas. We also used human labels as a baseline for performance. The outcome? A highly-functioning AI-driven tire inspection system.

Case Study 2: Detecting Human Activities

Next up, we worked on an app capable of detecting human activities from videos. This task threw up a fresh set of challenges, particularly regarding making errors transparent to ensure a good user experience.

To make the process more user-friendly, we introduced a post-processing mode, which allowed the user to record an entire video before detecting activities. The user could then verify and, if necessary, correct the model's findings, leading to an improved user experience and model performance.

Conclusion

All in all, it’s crucial to plan for the real scenario and choose wisely between different models based on the project. Moreover, it's beneficial to provide the users with the power to verify the model's findings. This way, the impact of errors can be minimized, leading to a smoother user experience.

Interested to understand more or want to share your experiences with machine learning challenges? Feel free to connect with me on LinkedIn or share your thoughts in the comments section below.

Thank you for your time, and happy coding!


Video Transcription

Good morning or Good afternoon. I'm not sure uh Everyone. Uh My name is uh Felipa. I work as a data scientist at Continental. I currently sit in Portugal more specifically in Porto. And basically, I think most of you may uh may already know Continental.Uh It is mainly known by its production of uh tires, but the the business, it's much broader than that. So specifically in the field of building smart and new digital products, we are building a lot of expertise is there we build products for different fields such as agriculture, sports of of course, automotive also. So today, I would like to share uh my view, my experience in building these kind of products. I will mainly focus on the big challenges uh that arise when we are building machine learning based software instead of traditional software. And I also want to come with some solutions or at least present the solutions that we are pushing forward at uh Continental. So let me start by uh explaining what is the main difference between building traditional mobile apps or web apps that we all know um in contrast to building smart products that are based on what we call artificial intelligence.

So if you see here in this comic that I really like um if we have a common normal traditional software mobile app, if you give the designer and the developers, uh the designer gives the developers a wire frame explaining what is the expected behavior of a certain button of a certain layer.

Then it's quite easy. I say to understand what are the steps and to plan for it. While for an A I product, this can be really different. So let's say you want to build a self driving car, you know what a self driving car is supposed to do, but it's difficult to plan, right? And to see to foresee the steps that are needed to build this kind of um of features. Why is that basically because in traditional software, the developers use what we call the software 1.0. This means that the developer will give explicit instructions to the computer. So if you with this button, this is the behavior if you get, if you take a picture, this is the behavior while for uh building machine learning based products, we are in the era of software uh 2.0. This means that normally the developer which can be like a data scientist like me, we won't give explicit instructions to the to the computer or to this mobile app. Instead we will give a lot of data, a lot of examples and the computer will learn with this example to make it simpler.

Um If you are trying to teach a kid the difference, for instance, between a dog and a cat, you will show these kids a lot of examples, right? You can show pictures, you can go to the street and show our dog looks like our cat looks like. And then the brain of the kid starts to learn. And for software 0.2 0.0 it's exactly the same. You give a lot of examples, you build a mat a model that is based on uh mathematics. And because of this, it's a probabilistic model. So while in software 1.0 it's a deterministic behavior of the product. Here, it's a probabilistic behavior because it's a model that is answering uh or uh out output something. And how does this affect the development cycle or the when you are an owner or a manager of the product? Why is it so challenging uh to base the product on this software 2.0? So if we look at the traditional software cycle, I would say that if you follow the good practices that have been um studied for years now, if you plan well, you do a good analysis of requirements, a good design, then this cycle runs quite smoothly for machine learning software.

It it is it is also a cycle but with a lot of forwards and backs. So uh we start by understanding the business like we do here. But then we also need to understand the data to prepare the data, to create the model. So there are a lot of additional steps. And at each one of these steps, you might need to go back to the previous ones. So regarding our experience at um continental, what we see is when we are in this region. Uh so this gray area of understanding the business and the data, we find a lot of different scenarios within the same company. So sometimes we have some teams, they don't have any kind of data, they don't have the knowledge, they don't know what they want to do, but they know they want to do some uh thing that is intelligence, intelligence. Sometimes they already have the data but they don't have the knowledge. So we need to help them uh do this and sometimes they have the data, they know exactly what they want, but they don't have the know how yet. So this is quite difficult because you have such different scenarios. And also when you go to one of uh to each of these steps, you can also find different um obstacles. So for instance, if you go to the data preparation section, sometimes you need to collect data.

So you need to set up uh a new project which is data collection. You need to give foundations to um uh give them the best practices for collecting this data. Sometimes they have the data, but there's no quality in this data. So you need to set up maybe a data engineering project before the actual data science project. And sometimes it's really difficult to handle the data if it's like stored in a very weird unknown database, for instance. So when we move to this part of the modeling, also, it will depend a lot if it's a research based project, if it's something that needs to go live, and then you have speed requirements, for instance, sometimes you need to come up with new metrics because it's not all about accuracy and reliability.

It's also about the efficiency of uh these models, how fast they can run. And when you go to the deployment, you can also find different scenarios. So sometimes they want to build a web app, a mobile app, sometimes it's only a dashboard, uh sometimes it's only an end point. So an endpoint is something that you can call a model and ask for a prediction. So every day I want to predict how many tires the company will sell. Uh This is what we call an end point and with such uncertainty, such diversity of scenarios, how are we dealing with this? Uh in our company? Basically the team of data science. So the data scientists, we don't see ourselves as a team that is only focused on delivering A I products. We see us as a pool of data experts that want to help the company foster uh data science to improve the data literacy, to give support to any team that wants to develop A I BASED solutions. So sometimes we might be uh acting as consultants. If some team just wants to uh evaluate potential use cases, we can um advise them to follow our standards, our processes, our good practices. So they are the owners of the project. We just go there uh to give some support.

Sometimes we can move forward and it makes sense to move forward and explore some of the data to see the actual potential. When there is a clear goal, a clear business problem. And we want to check if A I can solve this. If yes, if there's value, we can move to a prototyping phase where we build the MVP and we can start using more resources, maybe some um methodologies such as Scrum. And if finally, there's like the interest to invest more in this kind of solution, we can implement an end to end uh product and be the responsible ones to bring it into production. So this is our main approach. I think the strategy, let's say, and now I would like to give you a concrete example of a project where we faced some challenges, how we solved them uh uh and how we dealt with uh with this kind of problem. So let's say someone asks, of course, uh that we design a tire inspection system. Um This is something like a self driving car for expecting uh tires. And we are not um very sure uh about auto start. So in this case, it was a real use case where the team already add images of tires, they already had the labels. So the labels mean that they already know this tire is a problem. This tire doesn't have a problem. They add some knowledge.

Uh they were using traditional methods, not deep learning or um machine learning yet. And they had a very target solution already defined. So for each image of a tire picture, they wanted to detect if it was in a good state or if it was um damaged somehow, if there was some kind of anomaly, the metric, they also knew they wanted to reduce the false positive rate. So with these traditional methods, they were detecting a lot of false anomalies and they wanted to reduce this. So after understanding the business, we um went forward to to understand the data and you will see how, how important this is. So the classes we already know we have tires that are OK. Tires that are not OK. So this is a binary problem. Then it's also important to dive in the data and understand that we have several tires. But for each tire, we might have uh several images. So then how can we decide if for one image we have uh OK. And for three image we have not o not OK. And also for each image we have we had in this case, several patches. So several regions of the image that we wanted to focus our attention and detect if there's an anomaly or not in terms of categories. Um This is still I would say data but also business understanding.

So for among all the images, we could have different tires, small tires, big tires, we could have different mold types. This means that the way the process the the tire was created was different. And then we can also have different types of defects. It's only not only one type of defect, there are many as you can imagine and also they might have different types of uh severity. So why is this important? One of the first steps we we we did was to visualize this distribution. So let's look at the data and let's look at for instance, how many defects of each type we have there. Why this is important? Mainly I would say to set expectations to the stakeholder. Uh So to this team or clients here, you can see that there are some uh defects that are really common, but then you also have some defects that are not even represented in the data set. So you don't get to see any kind of example of this defect. And this is important because then you can go back to the stakeholder and explain them. OK. We are building this model. But once you are starting to test it. If you test it in a new uh defect that we didn't have access before to an example, then we cannot expect the model to accurately detect these kind of defects that were not represented in the training set.

So as I I said, with the kid, if the kid didn't saw any examples of um of these defects, then it doesn't know what it is. The same applies to the article distribution. If, if it, if there's some article that is not represented, then you cannot accurately detect um the effects on these articles. And for the severity, I think what is more important and more interesting here is you can try to mm um to convince the stakeholder to break the broad, which is very complex and to solve problems. So here you could say let's focus on level one severity because they are the most common ones. And if you solve this problem, it's already a very good progress or maybe let's focus on the level four because they are uh less uh cases but they are the most critical ones. So if you solve these ones, if you find these ones with a high reliability, then it's already uh good, good progress. Apart from that, if you have human labels.

So in this case, we have humans labeling images and these humans are the ones who decide this is a not OK tire, this is a tire that is OK. And what we saw is for very similar images that there's no, if there's not no agreement between two labeler. So if you give the same image to two humans, they will disagree. And this is important because if you are able to show that the labels are not always consistent, then you can set the expectations and, and tell, OK, if the humans cannot agree in a, in a label uh for more than 80% of the times, then we shouldn't expect our model to perform um more than 80%.

So to sum up, I would say that it's very important to dive deep into the data understanding and show what is feasible. What is not according to the, the data, we have try to break the problems um and choose the most relevant and critical ones. And if you have human labels, uh try to use this as a guideline for, for uh like a baseline for performance lastly. Um And very quickly, uh I would like to give another uh small example. So let's say we want to build an app that is able to de detect activities of humans. Um And we have a video and we want, for instance, our user is jumping and we want to detect these jumps. So let's say our model detected two jumps. This is a an example I want to give regarding uh how to design machine learning apps. So in this case, let's say this jump is a false positive. So the model detected a wrong jump, it's not a jump, but the model says it is a jump and here it detects a real jump. So this is an error and how can we design uh the app better? So that this error is more transparent and doesn't give a bad user experience. We have one option which is the streaming mode. So let's say we are recording and on the fly, we want to detect a jump. If it's a jump, we crop the video.

The problem with this is if we have a false positive like this one year, then we will crop the video. The user cannot understand why the video was stopped and cropped. Uh And this kind of mistakes from the model are not reversible. So this is a dangerous outside uh mistake, it gives a bad user experience. But if uh somehow we manage to deliver a product where we do this in a post processing mode. So let's say the user records an entire video. And after this video is uh stopped when the user wants, we will detect the jumps within this video, we ask for the confirmation of the user. So if it is no, this is not a jump, it just clicks on delete or not confirm and then we can kind of revert the decision of the model. Um And we can even use later this feedback of the user uh to improve uh the model. So to sum up in terms of design, I would say it's really important to plan for the real scenario to choose in this case, streaming mode, post processing mode to make the errors trans parent. So if the model fails, give it to the user and the user is um required to verify it.

For instance, this way, I think you can um try to design the the app so that the impact of the errors, they are still there, the errors, but they don't have such an impact in the user um experience. So now I would like to have some uh time for questions if you have questions or also if you deal with this kind of challenges in your daily jobs, I would like to hear about this if we don't have time here today, uh You can reach me um in linkedin and I would like to, of course, thanks to Human Tech uh Global Conference uh for the opportunity to be here and thank you for your attention during this session.

If there are no questions, maybe I would ask for instance, and if people are still here, um if you have some kind of background in machine learning or if you also uh deal with this kind of problems or if you just came um out of curiosity, if you are interested in this kind of um of session and topic, OK.

There's a question from Branca. Hello. So how much training data. This is of course a good question because um so it will depend of course on the project. But uh for this kind of models, computer vision models minimum, I would say 1000 images, it's like the most common number um for activity recognition, for instance, uh one of the projects have been part of it's more around 2000 to 3000 images. But also if you have a video and if you use all the frames from these videos though, then for each video, you already have lot of examples. Um But I would say it's definitely um in the um the region of thousands, 1000 to 3000. Uh normally, of course, if it's a very complex project, for instance, the tires, if you have uh 100 different effects, then you need maybe 200 examples of each defect, then you, you need to multiply it depends here we saw a binary classification problem if you have multi class problems.

So instead of just detecting, OK, or not, OK, you want to detect what kind of defect is there? Uh then you need more data. But if you want a reference, I would say 1000 for this kind of problems um is, is a good reference. OK. Thanks a lot. And I wish everyone a good um conference after all. See you.