MLOps and ModelOps - Why does it matter? by Priyanka Sharma


Video Transcription

Hello everybody. Uh My name is Priyanka Sharma. Uh I uh I am an enterprise architect in Firmin.So uh I started my career as a developer around 17 years back and I have seen the evolution of this, it industry very, very closely and uh what we all have observed now that no matter in what industry or what domain you belong to being digitally successful can make or break your enterprise.

And the concept of ops is so important. It's the key factor in the digital evolution or your digital success. So today I'm talking about what is this ops and why does it really matter? So before we dive into the project or pro uh into the presentation, I would uh like to give you a brief introduction to uh Firmin. So, uh Firmin is one of the major clients in the perfumery business. We cater to fine fragrance and consumer fragrances. We actually manufacture the fragrances which goes into each and every end product uh like body home care, like your creams and shampoos and so on. Uh It's a very, very niche skill. Um And uh the fun fact that I want to share with you here is that uh there are less perfumer in the world than astronauts. So uh we are working with a very, very specific and unique expertise and uh that expertise also goes into um all the application development. And we are very proud to announce that we now have uh a digital perfumer. So there is a robot to whom you can tell that what kind of fragrance you desire and it can uh develop that fragrance for you. So that kind of digitalization we are talking about. OK. So uh let us go into the agenda today.

So I'm going to talk about the brief history of the DEV ops. Uh We'll deep dive a little bit on the waterfall and agile practices uh come to the concept of Dev OPS and derive the ops concept from it. Then we understand what does the future application demands? What is model ops and data ops and the key organizational practices for a successful op. So uh we see that there are plethora of ops, right? There is Dev ops, there's design ops, there's products, ops and so on and these ops are not any longer, just technical, they're also going into the business domain. So you also see like marketing ops, business ops, people's ops, sale ops and with the all these ops, what you also see is plethora of tools, you know, with each ops, there comes support of tool and we are flooded with so many tools And so we'll understand why with the concept of de ops, why did we come across so many tools and why there is SUD suddenly so much uh surge of tools with des, right?

So before that, we, we see how the Dews came into concept, right? So it all started with application development. So to begin with, we were doing uh agile, uh we were doing a doc uh application development in the beginning, right? In 19 seventies to 2000 waterfall came into uh practice. So we were doing uh uh uh sequential development. I would say then the agile practices came into picture. So agile concept actually brought a lot of things on the table which ha had upper hand over the waterfall approach.

And then Flickr announced that they could do 1000 deployment per day, which was really unheard of in that era. And to support this 1000 deve uh deployments per day, lot of open source uh uh uh s tool came into practice and these tools were born or uh um uh basically uh envisaged to help those de uh 1000 development uh deployment per day. And then, you know, they came a Phonix uh Phoenix project. So Phoenix project, I'm sure you guys must have read the books called Phoenix project. If not, I really, really recommend if you're starting on the devops Practices. And with Phoenix project, all of this came in into a workflow automa uh automat organization. And then a lot of enterprise started to adopt uh the devops practices and what next now? So we will talk about what next now. But uh before that, let's dive a little bit into the waterfall model. So um uh waterfall model is a sequence sequential software development practice, right? So you have different boxes for each phase. So you have requirement phase, you have analysis phase, you have design phase, you have implementation phase and everything works in silo. So you don't talk to each other.

So developer works in his own box in his own machine operations, prepare their environment for production to prepare the use cases and so on. And then you know, you suffer from something called the local responsibility syndrome where you know developer says, OK, it works on my machine.

I don't know why it is not working on production and operations. I have done exactly what developer has said. Now I don't know how the work uh the code works and so on. So we were always in this flux and uh you know, there was always this wall of confusion um uh because you know they were separate boxes, not talking to each other. So while all this was happening in waterfall, there came a gile concept. Now what happened with the G that these boxes started to interact with each other and this wall of confusion was broken. So now developer has the understanding of the business requirement. Developer was being connected to the operational guys, Infra guys and the functional and the nonfunctional requirement um uh were working in tandem. Uh the code which whatever code was ready was being deployed and you know, the small iterations workable code, frequent releases.

It all came to reality, but this was still a concept. And there is a very funny analogy that I have figured out that I, I really want to share with you because it's, it's quite apt. So, uh, let me know if you can hear it.

Well, you fill in the blanks. OK. So now we're gonna start discussing men's brains. Women's brains and how they're very different from each other. Now, I want to start with men's brains. All right. Now, men's brains are, are very unique. Men's brains are made up of little boxes and we have a box for everything. We've got a box for the car, we got a box for the money, we got a box for the job. We got a box for you. We got a box for the kids. We got a box for your mother. Somewhere in the basement. We got, we got, we, we got boxes everywhere. A and the rule is the boxes don't touch. Have, yeah,

when I get to the woman.

Mm. Sorry. My Catholic upbringing got in there for a minute. But I mhm I, I'm not a Catholic but I went to Catholic school when I was little. I, I had a nun who taught on hell, like she was born and raised there. I mean, I'll never forget it. But uh it did me good. Actually, it was a good thing. Now, women's brains are very, very different from men's brains. Women's brains are made up of a big ball of wire and everything is connected to everything. The money's connected as a car, the car's gonna do your job and your kids are connected to your mother and everything that

he was just like. So I think we got the concept. Uh So I would like to say that, you know, a waterfall model is like this man's brain where we have boxes and not connected to each other. And then the agile came into picture where it's like woman's brain, everything is connected to everything which makes things quite quick and quite easy. And every everybody, everybody is aware of what is happening in totality and not in uh in a separate box, right? So uh when agile came at this point, uh agile was good in concept, but definitely the tools and technologies were not matured enough to help agile practices. And what devops brought into the table is this tool and technology to support the agile way of working. So there were tool designed to cater the specific task and the tool was supposed to communicate to the next tool so that the workflow which could be achieved. So if we look at the development life cycle, uh we could uh commit the code in GIT git can actually trigger the Evan Bill. The bill can actually trigger the test cases, the test cases after being finished can build the release. Uh The release can automatically be deployed. Now, this deployment can be on infrastructure which could be automated and then once everything is in production, it could be monitored and so on. So what we had the seamless flow of things without intervention and this is what actually devolves what into the picture.

So there are two main concept first is tools for the individual steps and the workflow automation. So from this DEV ops, if we derive the concept of pure ops, right, if we talk about operations in any context, what does it means? So ops basically means that you identify the life cycle of whatever do domain you're working on be it code. So here we are talking about code and development but it could be models, it could be data, it could be a business process, you break everything into small small pieces and derive the life cyc for it. Then you assign tool to that individual steps and then these tools can talk to each other and can do your workflow automation. So that is basically the core concept of ops. And why, why is OPS so important work ops is so important because it takes a lot of responsibilities out of your head. And these all responsibilities can run in an autopilot. Mode so that you can focus on your business area and not actually work, think about operations because operations can run automatically. And these responsibilities include analytics and analysis.

So all these tools are available which can give you analytics and analysis, they can give you alerts, notifications of what is going wrong and so on. We talked about automation, we talked about enablement. You can put your governance on uh uh on top of it, uh all the orchestration tools are available, you have, you know, uh your workflow can be represented of your process. So you can uh do process control, you can configure your own processes, you can set up monitoring uh on your infrastructures and tools. So everything can run in autopilot mode. So basically, if you do ops means you have spend a lot of time in planning how your operations would work. And then from there on it works in an autopilot mode. So that is basically the concept of ops and it can be applied anywhere. So uh if we just recap a little bit. So what happened in wa waterfall early step of uh industrialization of the core developer opera uh operations was in a very silo uh manner, agile software development, everybody started to work together. Um But the tools and technology was not matured, then the concept agile concept was put into practice. Then tools and technology came to support it. Then the workflow automation came because tools started to talk to each other and then the enterprise adoption of DeVos happened.

And now we have seen recently that uh in past decades that the applications are becoming start. So the what is future? So future here is intelligent, you know, it's not only smart applications, it in it is intelligent applications. So in next 10 years, every piece of software that we are going to interact with is going to be intelligent, it going to have intelligence built in. So it is already available in youtube. It is available in emails we write because it autocorrect your sentences for you. It is available in our inboxes, it is available in the various appliances that we are buying for our household. And um so every software component will be intelligent and we as a community are working towards it for that goal for, for sure, right. Uh So uh here, I'm uh taking example of the perception system of a self driving car. And uh basically idea here is to, to uh highlight that there is a lot that goes into building an intelligence system and it's not the old concept of only coding, right? So in the case of the car, uh the car will have sensors and the cameras in order to figure out what is the obstacle on the road, um what does landmarking look like and so on and so forth and uh uh uh you know, uh if you are uh taking all this, so first step would be, you know, to take data from all the cars that is on the street uh in different conditions.

Then you uh build a model to train on that data and probably most of the time goes into this model preparation, then you run that model on a specific test set. So you your test set could be a golden test set uh where you have uh the exact conditions um like, you know, uh the the low light condition, the rain condition or the road on the snow. So you have different test sets for different kind of condition to test your model. And once your model is tested and validated on this local test set, you actually work with the coat together and work this whole system into a simulator because you will not test your car on production first time, right? So you make this uh this simulating environment where you actually are driving simulator where you'll test your code and then once everything is OK, you package your code here is whole C I CD and everything that comes into picture, you deploy the model and the surrounding code uh uh together.

Um then you could uh put uh code into production, you make predictions, the car actually drives. Now you collecting more and more uh data. So your code actually has to be built to uh to capture the predictions of what uh the model is predicting because you need data for the future. In order to improve your model, then you use all that data together and do the whole exercise all over again because your correct prediction, your incorrect prediction, everything goes back into the model to train the model further. Otherwise your model will get obsolete. So uh uh again, the thing I want to highlight here is building an intelligence system is way more than building just a model of building just a code or working just on data, it is really integrate combination of the three. So you need for these three pieces, you need the three ops or every intelligent application stand on this pillar. And there is a lot more going into it. And in order to run this into autopilot mode, you need ops for all these three pillars. Uh and you want to make these three pillars working as efficiently as possible. So all ML and A I basically picked up the concept. So ML ops and data ops picks up the concept of this ops from DEV ops, right? What does this mean?

So if we go and look at the ML life cycle, um so uh there is again uh data preparation, there is training and there there is another the whole loop of collecting the data doing feature engineering training, the model, evaluating the model. And once this model is prepared, you operationalize the model, you deploy the model, you serve the model, you think about auto scaling resource utilization, high availability sl A maintenance and all these things, you monitor your protection, you analyze the insights and, and go back into the loop.

So again, this entire model life cycle has to be broken and made efficient in MLS. So what do we do? Again? You uh define the ML into small tasks. You set up tool for the individual task and you set up the workflow automation. But looking at the complexity of the life cycle, you're gonna find that there is not just one single silver bullet that will solve your problem. You're gonna find tools that will help you in specific pieces or specific task and they might do one, one or more, but they have to interact to the next tool in order to actually make the workflow automation. And that is why you see that there are so many tools available there because the tools does not mean that it has to fill ML ops only the tool also need to fit into or enterprise. The the tool must uh fit into your processes as well, right? So there is no um no, just one set of tools that say if you want to do ML OPS, go ahead with this tool and and you'll be efficient. It depends on your business process. What are your workflows, what are your kpis and so on. So now we have seen uh similarities between ML ops and um mm uh data ops, right? Let's, let's touch upon uh data ops also.

So data ops is actually a bit tricky uh because data ops is quite new and quite large and the definition are still being processed, it's it's not complete. Now, we are evolving on data ops. So data ops is basically the combination of uh information or data architecture to ensure the data quality, data security, data integration, data governance and all those things. And um uh so data integration comes will tool, the ETL tools and so on. You will have data governance where you will see uh who is responsible for the data. If I have to add a field, I have to delete the field, which domain is responsible for which kind of data, what data goes into which application and so on. And once you have figured out all of this information architecture, the data integration and data governance, you have to put auto so that all these things whatever is decided works in into the workflow. So basically what data ops engineer does is supporting the tools that follow this workflow and and uh you know, uh uh for example, setting up the A four pipelines uh or providing data in the data lake and make them available to the various B I tools and so on. So all this goes basically in into data hubs. So uh we talked about similarities so far. So let's look at a little bit of differences as uh so as I mentioned that ML ops uh and data ops basically reuses the DEV ops tools and principles where possible.

But they're also distinct why? Because domains and activities are different. So, model training and testing is very different from code building or testing, right? And uh uh does the test case even exist for a model? OK. What data goes into the model is very important if you uh you know, you say you fill uh you give crappy data, you will give the crappy, you get the crappy results. So here uh it is very much dependent on data where code uh is a separate identity in itself. And um there is also accuracy requirement for the model. There are outliner settings and, and the concept, all these concept is quite difficult uh different from a typical development mindset, right? Similarly, uh when you do uh model monitoring, it is very different from A PM uh monitoring. So in A PM mo uh monitoring, what you'll uh uh monitor for, you will monitor for uh CPU utilization latencies. While model monitoring will look at what is your model quality? What is your data quality? Um And you know, you will trigger a lot based on that, then we should also understand that people we are serving are very different, right?

So here uh uh in model S, we are s uh serving data scientist um rather than software developer which are uh less on technical tools and so on. So that is why the tools it's to be different. Uh a developer can work on a tool which is a really complicated but data scientists have to have more power, you know, uh mm uh to do uh self uh serving uh rather than uh self coding or things like that, right? Um So ML ops also need tight integration with data but ML ops is very different from data ops. Generally, people say that data ops, ML ops goes together. Yes, it goes together. But ML ops is different. Data ops is different and the skills and requirement for these ops are very different. Uh Data ops uh skills would be mostly the FO pipelines, the spark clusters, uh A lot of SQL queries, a lot of ETL tools and uh and so on. Uh You might also be working with administrating the caches and uh uh uh administering the databases and so on. Uh Whereas uh for model ops, we would uh probably require data engineering and data scientist uh uh job profiles. So uh after we have understood this whole concept of ops, we still get tweets like this, right? So you build a model which is in three weeks, but to deploy a model still take 11 months because uh your model ops need some organization level changes to onboard ops.

So I here list the most three important changes that I have observed in my life that an organization must do in order to embrace. So first and foremost is structuring your team. You know, there are two popular approaches of structuring and application development team uh to build an intelligent application, of course. So first one is the product squad structure as I would call it. So what does that mean uh is that you have a product team? So you have uh a data scientist, you have a data engineering, uh uh data engineer and a software engineer together in a team and they work together for that product, but it is still a silo development. Um These people are very skilled in this requirement and uh will efficiently deliver this project very closely, talk to each other and so on. So here the iteration uh cycle is quite fast. But the problem here is that uh you know, the core expertise is only within that project and unless you have a governance and the documentation model and your knowledge sharing model within your organization, uh these skills will be lost. The other kind of structure uh that uh that works is the coe structure. So you have a core data science team, you have a core data engineering team and then you have, you know, resources from these team uh being assigned to the uh products here.

The problem is the iteration cycle goes a little longer because there is no sorry because there is no um a dedicated resource for this particular project. But in this case, your skills um are centralized um the problem uh which uh a person will pay uh uh face in product one, the same problem will not occur in product two. So probably you can save time there. Plus everything would be uh centralized and governed and you can put uh best practices uh in terms of architecture and everything. Uh If you have ac oe structure um and with development, you also need to think about the infrastructure. Now again, as I said that um m uh for developing an intelligent app, it is different uh from code development. So it also impacts your infrastructure team and it really makes sense these days to uh you know, have a dedicated team for each service. So you should have or you must consider having an uh app services. Infra team, an ML service insta team and a data service uh Infra team. Uh because uh the skill set, the tools for these three are absolutely different. And then within your organization, you will also have, you know, the data, uh data services team uh using the capabilities of app service team, app service team, using the capability of data services. Uh And uh the machine learning code has to be deployed in the application.

So these are so within uh within your organization and outside your or such a modular component and then you can structure govern and make vertical organization to derive the three Infra team and can have uh the DEV ops data ops and model ops put in place together, right? Uh The second thing that I would say is as I have been emphasizing on tools, the right tool is needed for the right task. So you need a team for collaboration. Uh mm uh You need collaboration between the team. So how to do the data science team will collaborate with each other and the ability to discover what model exist, what data exist um Is the data critical is the model available for every user. Is it built for a specific user? Uh and so on. Um And same as the discoverability. So any data scientist should know that this uh this complex uh model, how it is working what data it needs and so on. So everything has to be transparent. Otherwise you will have this intelligence but nobody knows how this intelligence work.

And next time you have to do something on it or if the person leaves, uh you have no knowledge around it and you have, we have to build everything from scratch. So collaboration, discoverability and documentations are extremely important. Same is in terms of security, a model has to be secured. Um Yeah, when uh so the the users of the model is very different from uh uh the people who have delivered the model. So in terms of security, you can't even imagine how people uses this model and they will uh push in um mm uh fishing data. Yeah, they will uh they will try to break your model and so on. So this has to be thought, thought through and you have to put proper tools in place so that your models are se secure. And uh uh what really works is self-service. Wins. As I said, that your data scientist team should be capable of serving themselves. They should not struggle to get the data that I need data from application A application B no, you should have data warehouse. You should have uh uh data like concept. Uh You should have data catalog where they can think about what are my data objects, where uh what is the master data and how and where should I get this data from?

And so all these capabilities should be built with the right tools and uh with the right tools, the most important decision that you have to take is whether you have to build or buy, you have to focus, you know, uh whether you want to uh focus your team's time and energy in building a tool for their task or actually doing the task.

So uh recommendation is by the tool if it is available rather than trying to build it yourself. And uh the third and last thing is ownerships and KP I, so uh this is really the key piece. So uh uh if you have the ownership, you decide the KPIS, of course, right? So no matter if you're working or for a company or an academic organization, you always want to build the best model. But you know, there is always a limit, your model can be improved and improved and improved. But you need to define your KPIS and, and say that, you know, if I meet this KPIS, this model is done for me or this uh these KPIS are important for me and the rest of the KPIS are not important for me because what I have also seen that a lot of organization waste a lot of time in just tuning the model.

And by that time something else comes in the market. So you can't keep tuning the model indefinitely. You have to derive, you have to define your KPIS and you have to meet those KPIS. So in some cases like healthcare, uh the model failure can be life threatening. So you have to be really, really careful in defining your KPIS there, whereas some in some cases like recommendation and then it might not um be that critical. So uh you can let go uh some KPIS, right. Uh So with that, I have covered a bunch of uh ground today. Uh So uh let's recap quickly the takeaways. So first and foremost that uh you know, if you're working on a data science or data engineering team, uh uh you must understand that the developing a futuristic applications stand on pillars, model data and code and you should think of it holistically as an organization when you're bring, bringing or building your organization, you must think of these three pillars and not just follow the concept of code development, these ops are made to optimize uh for iteration speed and high quality.

So you can bring your product to market quickly. Each ops is very, very distinct discipline and they require different people and tools. So don't push the people uh to do all the tasks together because the skill sets are very, very different. As I mentioned, you should have uh various infrastructure teams and various you should have a data scientist, data engineering and uh a software developer as the key roles and not try to intermingle these uh team structure and tool. As we talk about these can expedite uh your collaboration or actually can bring you down if you don't have the right set of people and tools at the right places. Uh Yeah, it could be very difficult. So choose wisely and uh um last but not least the KPIS are very important before you think of any intelligent application, you should put a stop, right? Like we used to call in the code that you should uh define the scope. So you should also define the scope for your model, right? So uh that's all I have. And what I would say is just keep calm and uh let the operations handle it. Uh Now I would open it for any questions if you have. I see a question here which talks about what is data governance. OK. So data governance is basically uh suppose you're working in uh a financial domain and you have a key data objects like customers, you have objects like credit card information and all these things.

So your data object must be defined and must be defined what that object concerns of and who is responsible for that. For example, um customers can have customer address, right? But do you want to put this customer address in your core customer as a data object? So all these things have to be defined and there has to be a data owner, there has to be a data steward so that your uh data object does not fall out of place, it stays what it is defined for and you have a person responsible uh for that data. So i it is a big exercise these days going on in all enterprises where you have data catalog, where you catalog all your main business objects, you define who is the business owner for it, who is the technical owner for that object. And if I have to add a field, for example, in this table, for that object, it has to be approved by a workflow. So you can't just go along and add any field in the customer and so on. So uh the data governance is whole practice is put around it. And there are a lot of tools that support data governance as well. Can you have ah of course, why not? So, uh I am uh I will post my linkedin profile here. Give me a minute or you can uh search by Priyanka Sharma Firmans and you'll find it. Ok. Fantastic guys. I think I have taken more time than I'm prescribed to.

So thank you very much for whoever was listening to me and I would love to connect uh to you, send me uh an invite on linkedin and please don't hesitate if you have even any more questions on linkedin. I'll be very happy to answer. Ok. Thank you very much and thank you Woman Tech for giving me this opportunity to speak. Thank you. Bye bye.