Serving TensorFlow models with Kubernetes

Automatic Summary

Utilizing TensorFlow Models for Image Anonymization: A Workflow Overview

Hello everybody, and welcome to an engaging discussion on serving TensorFlow models, particularly in the case of image anonymization. I am Sarit Pinhas, a senior software engineer at an Israeli start-up, Duda, and global ambassador for Women in Tech, and I will be your guide today as we delve into this exciting topic.

The Problem We aim to Solve

The primary use case we will be exploring today involves the anonymization of images, specifically, protecting the identity of drivers in an input stream of images that are captured from thousands of dashcams per second. The main objective? To blur faces and license plates within these images. To achieve this, we need to identify or detect the faces and license plates within the images - a process that involves classifying the object within an image and determining its position.

After considering a few different technologies, we decided to utilize machine learning, specifically TensorFlow, to carry out these detections.

What is TensorFlow?

TensorFlow is an open-source library that is extensively used for the development and training of machine learning models. It was created by Google in 2015 to facilitate machine learning and offers several advantages some of which include:

  • Ease of building models: Models can be effortlessly built using the APIs the library provides, such as convolution.
  • Deployment flexibility: TensorFlow models can be deployed across a range of platforms - on-premises, in the cloud, in the browser, or even on-device for mobile applications.
  • Wide community support: TensorFlow boasts an extensive community of developers who can assist you in troubleshooting any issues.
  • Scalability: TensorFlow is designed to scale computations across multiple machines, increasing computational speed and efficiency.
  • Debugging: TensorFlow models are relatively easy to debug, which is not the case with some other alternatives.

Using TensorFlow for Image Anonymization

We optimize TensorFlow's functionality for the process of image anonymization using a two-phase approach: The first phase involves building the graph, also known as the training phase. Here, the machine learning model is ‘trained’ to identify license plates and faces in images through continuous computation iterations until the loss - the difference between model's answer and actual input answer - reaches a minimal level.

The second phase, known as the inference phase, involves running computations on the graph, where the trained and saved model is served, and image detections commence.

The Architecture of Serving Saved TensorFlow Models

The architecture of serving the saved TensorFlow models involves a series of steps. Incoming images from dashcams are first processed by an image processor component, which not only augments the data with additional information but also communicates with the TensorFlow processor components via a REST API. The TensorFlow processor gets the request and converts it into a request that the TensorFlow serving component can process.

The trained TensorFlow models, which have been saved to AWS S3, are retrieved by the TensorFlow serving component, allowing it to serve the detection. For fast response times, the model is only retrieved during the initiation phase of the component rather than with each request.

Our machine learning component comprises two containers: the TensorFlow processor and TensorFlow serving. The latter uses TensorFlow APIs to serve the saved model and return its prediction, which the TensorFlow processor then converts to an easily interpretable result, returning it via the REST API response to the image processor.

For performance monitoring, we calculate the full response time of each request, index the information to InfluxDB, and view it using a Grafana dashboard. Alerts are also set up to notify us when performance deteriorates.

Foreseeing Results and Improvements

Due to this innovative technology, all faces and license plates are competently blurred. What's more, the architecture enhances collaboration between the engineering team and the research team. When a new version of the model is created, either team can easily replace the old model since the bucket in which the model is saved on S3 is accessible to both.

Utilizing TensorFlow for image anonymization makes impressive results easily attainable, as evidenced by the final blurred images from our dashcams. If you are interested in delving deeper into TensorFlow or machine learning as a whole, I recommend checking out [TensorFlow's website](https://www.tensorflow.org/), its [GitHub repository](https://github.com/tensorflow/tensorflow), and a [Google IO 2019 talk](https://www.youtube.com/watch?v=JYhYZzQIQ3E) on machine learning basics.

It's been a pleasure to share this information with you all. Feel free to reach out to me on LinkedIn if you have any questions or simply want to continue the conversation. Thank you.


Video Transcription

Uh Welcome everybody to my talk about serving tensor flow models. Uh Let's start by uh diving straight into the presentation. So am I, my name is Sarit Pinhas and I am a senior software engineer at an Israeli start up called Duda.And I've started representing women tech as a global ambassador uh last year. So a little bit about today's talk, we will start by defining the problem, then I will overview tens of law and lastly, I will continue to the main topic of this talk serving tens of law models architecture. The use case today will be the anonymisation of images. So one of the recent problems I had to solve was protecting the identity of drivers in an input stream of images that coming from thousands of dashcam per second. You can see the dash cam on the right side of the slide. And therefore, the goal of the project was to blur the faces and the license plates of those drivers. In order to do so, I needed to detect the faces and the license plates on the input images uh where the definition of detecting an object is to classify the object in the image and to find its location in the image.

Uh After considering few technologies, we decided to use machine learning based on tens of law to do the detection. So what is actually tensorflow tensorflow is an open source library that was created by Google in 2015. Uh it was created to help develop uh and train machine learning models. There are few alternatives to using tensorflow like cuff and py torch. Uh but we decided to use tensorflow because uh uh of some advantages that I want to share with you today. So the first, the first advantage is that models can easily be built using the API S that provided in the library such as convolution, which is highly used in order to build a machine learning model. The second advantage is that tensor flow can be deployed everywhere on prem in the cloud in the browser or even on device in case you are creating an application. Another advantage is that tensorflow is highly used by Google obviously as they created it, Twitter uh intel and much more uh it has comprehensive documentation which makes it with, which makes it easy to understand how to use the available features in terms of law. And also it has a wide deaf community, meaning you can consult with other developers on any issue you might have. Moreover, tensor flow provides scalability of computation across machines, meaning the comput, the computation will be divided to all available machines.

In order to reduce the models running time, the less advantage I thought will be worth mentioning is that tensor flow can be debugged easily unlike other alternatives that do not offer this option. So after we convince that we want to use tensor flow, let's go back to our goal. We want to result in a detection of license plates and faces of the drivers in the image as shown in the current slide. Uh Here we can see the classification of uh let, let's say for example, the gray card in the left side uh of the image. Uh The result of the classification is a variable in the range of 0 to 1 that indicates how much the model is really certain that indeed the object is a license plate. Here, we can see that the model result in a 0.95 meaning uh it is a certainty that the object in the image is indeed a license plate. Also we see that the model detects the location of the object in the image. So how will we use tensor flow to achieve our goal? There are two phases to do it. The first phase is we need to build the graph. Uh It is also called the training phase. In this phase.

The machine learning will get the images that are called data and the detection um of the license plates and the faces which are manually text by human and called and serves as input. The result of the machine will be the rules of detection. How does the model result with those rules as shown in the right diagram. First, the model will randomly classify the object, then it will calculate the loss and update its variable accordingly. What is the loss? The loss definition is the distance between the model result and the answer provided to it as input, this process will continue to happen until the loss becomes minimal after the training phase will be over and we will save the model. Uh We could continue to the second fee to the second phase. Uh The second phase is called the inference phase in which we run the compute the computation on the graph. Meaning we serve the saved model in this phase, the saved model gets an image which is called data as input and the result of the model will be the detection. So now we can continue to the main subject of this talk how to build the architecture of solving the saved tensorflow model. OK. So this is the architecture. Uh The first step is receiving the input stream of images that comes from the dash cam.

Each image is being processed by our main component called the image processor. The Tron in a Cuber Neti pod, the image processor is responsible for enriching the incoming data with extra information uh such as um extended GPS information. Uh And in our context, the detection of the faces and the license plates in the image. In order to do so, the image processor communicates with the tensor flow processor components via rest API. The tensor flow processor gets the request and convert it to a request suitable for tensorflow serving component. The tensor flow model that we trained in the training phase from the previous slide and eventually saved after making its lost minimal is saved to AWS S3 in the in it of a tensor flow serving component, it gets the saved model from S3 and uses it to serve the detection. According to this model. The advantage of getting the model only in the image phase of the component instead of in each request is that the response time is much faster. We don't need to wait to the mode to be downloaded from S3 each time. But on the other hand, when a new version of the model uh will be created and we will want to replace the model, we will need to both change the model on S3 and to restart the Cerne pod.

But because a new version of a model is being made very rarely. And because it is very easy to restart the cerne pod, we decided to get the model only in the in each phase of the container. So as mentioned, our machine learning component is composed of two containers. Uh the tensorflow processor which is open for the entire certis cluster VRS API and the tensorflow serving that is only accessible to the tensorflow processor. Uh The tensorflow serving component uses the tensorflow API S to actually serve the saved model and to return the prediction to the tensorflow processor. Then the tensorflow processor converts the prediction to a more easy to read result and returns it via the rest API response to the image processor. After the image processor enriches the image with the prediction and all the metadata, it index the image uh to elastic search.

So now we have the detection of each image indexed in elastic search and it is available to all other flows in our product. So another uh very important part of this architecture is monitoring. In order to do so, we decided to monitor the performance of the requests that are sent to the tensor flow process. So we timed each request at the beginning of it and at the end of it in order to calculate the full response time, uh then we index the information to influx DB. We decided to view the monitoring information using a Grafana dashboard. So Grafana queries the monitoring information from influx DB and presents it to a dedicated dashboard. Moreover, I also created alerts to be notified when the performance is being slowed down monitoring metrics. Uh Besides performance, we also monitor uh the health of the system using certis itself.

Uh I've exposed uh uh a health end point that is being invoked periodically by certis when the endpoint is healthy. It means that the entire system is up running an EFI and when the endpoint is not healthy, uh Cerne is uh pod restarts itself. So all the ELF responses, both ey and not ey uh are being indexed to influx DB and then also being queried by Grafana in another dedic dedicated uh Elf dashboard. The last advantage I want to mention regarding this uh architecture is the improved work between engineering team and the research team. Conventionally, the research team is the one that creates the saved model uh using tensor flow. And the engineering team is the one that responsible to serve the model on production. Uh In this demonstrated architecture. The only place where the teams meet is when a new version of the model is being created and it is time to replace the old model. Uh because the bucket in which the model is saved on S3 is open to both teams, it means that the change can be done easily by either of them. This fact really improves the work between the engineering and the research team as there is a well defined and very, very easy way to use a different version of the model.

Here we can enjoy and see uh the end result of the project, all the faces and the license plates are being blurred using machine learning detection uh with tensor flow, which is pretty amazing to realize such a thing is uh possible quite easily. So for anyone interested, I've also attached a few links for further reading. So the first one is the tensorflow website that in the uh that includes the uh comprehensive documentation I've mentioned in the first slide. The second is the link to Tensorflow git repository. As mentioned, tensor flow is an open source library, meaning the code base is open for everyone to see and even to contribute to. Uh so there you can be up to date with the latest features and even to dive into the implementation details of all the flows that interest you. Uh The last uh link uh I thought uh Wolf mentioning uh is the talk by Google in a Google IO conference uh 2019 about the basics of machine learning because this is a Google talk obviously. Uh It demonstrates all the examples uh using tensorflow. So that is all the time I have for today. Uh Thank you for having me. It was great pleasure sharing my knowledge with all of you incredible audience here. Please feel free to contact me anytime at my linkedin account, which is, which is attached here as well.

And uh thank you. Thank you everyone. Bye.