The art of formulating a business desire to an AI Research Framework

Divya Choudhary
Senior Research Scientist (AI/ML)
Automatic Summary

Transforming Business Challenges into AI Research Projects: A Guide

As businesses increasingly strive to innovate, there's a growing need to convert complex business problems into actionable AI research goals. Divya Chaudhary, a seasoned AI/ML industry professional hailing from Samsung Research America's visual display innovation lab, brings her expertise to the forefront in her latest presentation. In this article, we'll distill her insights and strategies to bridge the gap between business desires and AI research framework.

Understanding the Differences Between Business and AI Research Goals

Before diving into the translation of business challenges into research opportunities, it's crucial to note the distinction between business and AI research goals. Businesses focus on product and service deployment with an emphasis on speed and market impact. Conversely, AI research tends to be more exploratory, experimenting to tackle unknowns and emerging technologies with less stringent time constraints.

Essential Metrics to Consider When Bridging Business and Research

Each domain comes with its own set of metrics. Businesses may focus on conversion rates and sales, while AI research measures success through algorithmic advancements and accuracy improvements. Divya Chaudhary emphasizes the importance of aligning these metrics to ensure that research outcomes bolster business objectives.

Converting Business Goals into AI Research Objectives: The Process

  1. Problem Definition: Accurately defining the business problem is the cornerstone of any successful conversion. This involves identifying real AI research potential within a business challenge.
  2. Research Scope Assessment: Evaluating whether an AI research initiative aligns with the current state of technology and the business timeline is critical.
  3. Proof of Concept (POC): Once a viable research direction is established, a POC using existing research and models helps determine feasibility.
  4. Data Set Preparation: An appropriate and expansive dataset is necessary for any AI model development, which often means investing time in data annotation.
  5. Deployment Strategy: Considerations around the model deployment, including infrastructure, inference time, and memory requirements, play a significant role in translating research into a business application.
  6. Evaluation Plan: Both quantitative and qualitative assessments are essential in judging the effectiveness and efficiency of the AI solution.

Real-World Example: Image Search in E-Commerce

Chaudhary offers an illustration of transforming a business's desire for an image-based search capability into an AI research project. She breaks down the bidirectional relationship between defining a tailored e-commerce solution and the research needed to create a sophisticated image recognition system that outperforms current market offerings.

Practical Steps Toward a Seamless Business to AI Research Transition

Chaudhary outlines a structured approach to turn business problems into research initiatives:

  • Clearly define the business challenge and establish its AI research relevance.
  • Commence proof-of-concept activities using state-of-the-art research.
  • Emphasize dataset curation, ensuring it resonates with your specific AI objectives.
  • Contemplate deployment frameworks early to guide research direction.
  • Develop comprehensive evaluation plans that encompass all aspects of performance.

Closing Remarks and Questions

In conclusion, the process of converting business objectives into AI research demands a well-defined strategy, an understanding of the limitations and potential of current AI capabilities, and a commitment to thorough evaluation. As businesses endeavor to navigate this landscape, remembering Chaudhary's advice may indeed prove invaluable.

Are you ready to harness AI for your business advantage? Dive into the world of AI research with Divya Chaudhary's insights as your guide. And if there are lingering questions or if you wish to delve deeper into this transformative journey, the conversation is just beginning. Feel free to reach out and explore further.


Video Transcription

Uh So today, uh I'll be presenting the art of formulating business desire to an A I research framework. So basically, uh let me give an introduction. I am Divya Chaudhary. Uh And I'm working as a senior research scientist with visual display innovation Lab of Samsung Research America.

I have been uh working in the industry in the research industry of A IML for around five years. And before that, I was working in applied research for three years. So having worked in the research industry for so long, um uh that is uh has equipped me with the understanding of how do we translate from a business desire or a business uh uh goal to actually an A I research goal. And uh there, there's quite distinct between both of these and uh today's uh talk would be focused on how do we move from one to the other, which is basically from business goal to research framework. So if you're keen on doing A I research and you are in industry and your uh company allows you to move and convert a business problem into a research problem. This is something that would be very bene beneficial for you to understand how do we go about it. So that is the introduction part of what we're gonna do in the next 30 minutes. And, uh, the after the 30 minutes of the presentation, um, I would like to keep this or maybe I'll do 25 minutes and leave almost 15 minutes for you to ask any questions that you might have.

Uh, and we can go deeper in those questions and discussions as well. So the major pointers that I'll be talking about today um is how do we, what are the different goals for businesses and research? They're not the same, right? So there is a strike difference uh between the business and the research and how do we go about navigating that? That's the whole um presentation about. Then I'll also talk about some of the metrics that we use. Uh And we need to be aware of when we are talking about the business problem as well as the research problem. We, we need to know what we are optimizing on and what is the goal, the end goal of uh doing it to cater our work accordingly. The third thing that we'll talk about um or is included within that is basically the timeline uh for the development and resource, they are distinct and uh how we should be mindful about that. And then we get into the meat of the entire discussion, which is uh how do we plan from converting one business goal into a research goal. So that is entire talk. Um as we know that uh the business world is completely different from the research world, right? So uh we want to focus um on how do we convert this business world, how do we move from this business world to the research world?

The business world is more focused on uh having products or services um as fast as we can into uh you know, the deployment and uh research is a little bit less paced as compared to uh the business. And rightly, so, because that is a lot of uh unknowns in the industry uh in the research domain than in the uh business domain. I'm not saying that the business does not solve unsolved problems. There are unsolved problems in the businesses which are given as services somewhere or the other. Uh the research part of that unsolved problem uh is uh being solved somewhere else, right? So you have something to start your uh business goals with. Um However, in a research world, you might not have anything um done before you thought of that idea or before it came to you. And that's a real challenge. Um There are many um engineering engineering challenges that are, are being solved in the business world and uh engineering challenges. Uh or I would say the engineering problems, they have been uh you know, um a lot of research has been done already uh in those domains and we have progressed a lot over the period of time. So we have answers to those problems.

Um And a few of them, we don't, but uh majority of them are something that is solved. So if you want to scale from say from a cloud infrastructure to on device infrastructure, uh how do we go from uh a big monolith to uh microservices and all of that is engineering problem and we have solutions uh very efficient solutions for them. However, when you talk about the research, uh if I take example from the A I or ML research problem, we don't before uh the uh the beginning of the deep learning, we didn't know how to uh how to basically build something that can, that can automatically detect and uh uh detect and classify objects at the same time, right?

And we could do it with some hand uh picked features um for smaller data sets, but it would not be able to scale because at that point of time, we did not have the resource to train such large models which would be capable of doing it. So that was a challenge of research at that point of time. So I'm just trying to give you an idea of what these challenges are and how are these two words very different from each other. Um In research, there is essentially high room for failing at multiple experiments because that's the essence of research. So you would actually be um doing a lot of experiments and then um some of them will work, some of them will not work, some of them will not even work in entirety. However, um fraction of them do work. And you have an idea of one portion of your big task um to be solved in that manner. So it is basically multiple hidden trial. Um not in the air but not just unknowingly, but it is a known hidden trial where you're trying to understand what are the steps that are, that are involved uh in achieving what you want to achieve through your research and how can you solve each bits and pieces together?

Right. So that's essentially what that is said. But in a business world, you might not have that much of room to do multiple experiments and fail and learn from your failure and then incorporate in the next experiment and things like that. It is the business world works on the tight deadlines, right? So that's one of the uh challenge of that we talk about how do we like work around this constrain these constraints of the business world to move to a research um problem um when we uh are in the industry and uh we are working on a IML problem, there are different metrics that we are trying to track, right.

So business cares even if the business is actually solving an A I problem, the metric that the business is concerned about is very different from the metric that the research in of the same problem would be concerned about. Like, for example, if a business is building a search engine, right, the metric that they are concerned about is how many of these uh searches are they able to direct to uh some kind of uh sale? Right? So if, if there are a bunch of results that come uh as search result to your query, how many of them actually convert into a purchase or how many of them actually convert to a landing page of something that could lead, potentially lead to uh buying. But as a research problem in itself, like uh what should be the app result for a given query, there are different metrics that we uh track like what is the mean reciprocal rank like ho how are these results ranked? And how can we get one representative of the ranking of these results? How do they align? Well, with the question, um what is our accuracy over say thousands of samples in the queries that we uh get? And those are very important uh from the research perspective because then research perspective is actually looking at evolving the algorithm to make it accurate, right. However, a business considers that algorithm is already evolved, right? And they, they're more concerned around tracking how much of benefit can it get?

All right, how much of uh you know uh conversion can it get uh for the company and those are different metrics. Uh The key performance indicators for business will be drastically or I would say not drastically at times, but they will be completely um um different in, in the sense that they can be at different levels. Um As compared to that of the research, the research will be more grounded level where we are trying to um uh to up up our algorithm and business would be more around uh trying to up our strategies for uh for presenting the results from the algorithm, right? So that it leads to more conversion. Um research is basically concerned around novelties, right? So we also track our research. So if you have a pure research organization, you you'll be tracking their novelties. Like how many novelties have they explored in say um in say across the three projects that they have done over the year or something like that. How many patterns do they have? These are the uh the guidelines or these are the metrics that you will evaluate to say whether the research group is good or bad. But for business, you basically look at numbers, right?

Um Now how much profit have they made and uh how much uh you know um footfall do they have or how much traffic is there on the website, things like that. So they, they work, they work in a different realm and it is very important to know that because when you're pitching uh a business problem, you're converting it into a research problem and you're pitching it to the business people. You need to be aware of what you're pitching, right? So if, if you go to a, your business head and you are, you can see the potential of um the, the business problem that you're solving, converting into a research problem. However, if you don't pose it in terms of the KPL that the business executive is looking into, uh it more likely or not will not be um accepted, right? So we need to be aware of what is the metric that you're trying to move and then find a similar metric in the research domain, not similar, but something that can uh directly be correlated with that metric in the research domain. That is what you're uh aiming to improve, right. So um here is the crux of the topic. So what we want to talk about is how do we convert the business project into an A I research project? Right.

And uh uh I can talk about multiple examples uh of how, let me give you an example, an idea of what I mean by converting a business project into a A I research project. And uh how potentially can it occur in our day to day scenario? Right. So if you are working on in a company called ABC and you have uh a requirement of uh working on say image understanding, right? So uh computer vision uh you have uh you have a product which is not related to providing the services of image understanding. But if you incorporate that, say you are um you are an ecommerce company and now you want to incorporate search with images, right? So that, that is a problem. Uh You, you've seen competitors doing uh search with images and your Ecommerce company ABC would like to incorporate something which is search with images. So given an image, give all um most likely products uh or similar products that you have on your platform. How do you build that?

Now, this is a research problem. Um Because yes, it is solved in the research community, you might have uh different uh requirements like you want it to be very fast. If the competitors are doing it in one second, you want it to be in nanoseconds or something like that. So you want it to be very fast, you want it to be very accurate. If the compute, if the state of the art of the competition is at 64% you want yours to be at 94% or something. And that is a research problem because it is something that is involving A I algorithm update. It is not merely just using a pre trained model and putting it on top of your search engine and getting the output that it is giving you because that is still 64% accuracy. You are intending to get 94% accuracy from the state of the art. And that's a research problem. Now, you know, as the executive or as a person working in the team that this is a research problem. Now, how do we convert this uh business problem into a research problem is where uh we um are talking about like in this uh entire presentation, I'll be talking about that. So let's go. Uh there are few major steps that are involved in doing any research project.

And those are the steps that I want to talk about in this short span of time that we have. II, I just want to give you a few takeaways uh in the direction that I can think about. Obviously, this is not complete. Uh However, these are the major uh steps that you should take potentially to uh go from business to A I research project. The first step is uh defining the problem, right? And what do I mean by defining the problem? How is that crucial to convert a business to research? It, it is crucial in every project, right? However, why I, I, I think that uh when you define the problem, uh it gives you more idea of uh the the research scope that is available if it is available. Now, the example that I gave you that had a research scope in it. However, if that, if there is something already built like that, and your accuracy requirement is less than that. Uh She would not bother going. Uh There is no research problem per se in that um uh in that business problem. And uh you while defining the problem should be able to uh should be able to infer that. So that is basically identifying the right business problem that can be converted into a research problem. So basically how do we define the problem is from the business perspective, trying to understand what are we trying to achieve? Right.

And then converting it into a uh into an equivalent uh research problem or ML research problem. There's a little bit of understanding that is required here to convert a business problem to any research problem. So you need an idea of the ML world, the A I um you know, the uh the deep learning, the A I knowledge, the algorithms in the machine learning um was the state of the art. How far have we reached in each of these uh different algorithms that will help you define and narrow down the problem from business to uh to um research problem. And generally, these are the people, these are done by people who have had experience in the industry of A IML. And uh they can very clearly see, oh this is what you want to do. This is basically uh machine learning classification problem. This is a classification problem clubbed with uh you know, object detection problem. Sometimes you want to um identify all the different um different objects in the image. And you want to basically find out where in the image are they occurring. So we want to localize that. So basically, that's if you, if somebody in the uh is experienced, they will tell you, oh this is just a localization problem. And these, these are the algorithms that are there and these are the data sets probably that you can explore and things like that.

So this uh basically helps you to identify the the fundamental problem, machine learning problem that you're trying to solve. And when you're trying to solve a business problem, so that's what defining the problem means. Now, you have to, after you've defined the problem, you have to understand what is the scope of the research here. Um Is what you're trying to achieve. Is it even possible with the given uh the state of research that we are in right now? We, we are not in a stage where uh artificially we can generate, you know, uh videos of let's say an hour length, right? Uh with some prompt. Uh So if you're not in that position, if, if your business need is that, you know, I'll give you a sentence. Give me a video of half an hour out of that, build a model. Uh Something that can do take a prompt, which is a text prompt and give me a video which is 30 minute long. Something that's not done yet. Something that is a doable, it is doable. But is that within the realm of your uh business timeline? No, because there's a lot of research that um that, that hasn't, you know, moved in that direction, right? So there is no baseline for you to explore and do that in the limited time dying that you might have, even if you stretch it for your business needs. Right? So you need to define whether that is a research problem that is uh what achieving or not.

And then uh these are all things uh everything in the defining the problem is basically um a subject matter experts, um um domain, right? Somebody who's in the domain uh in the machine learning and the A I domain and the research world for quite some time can have fairly good idea of what is possible and what is not possible, right? So that that's what we want to um like basically get um completely get aware of uh with consulting a subject matter expert or using somebody in the team who has the idea uh on what is possible, what can be achieved? Um Can a research in this direction give you an edge that is also very important. Like if, if uh there is a problem that you're trying to solve, right? But the le let's say object detection in itself, an object detection is 98% accurate. The state of the art is 98% accurate, can further research in that help you improve or can, can it give you an edge in the production? Can you possibly uh moving from 98% to 99%? Can, can it possibly give you the gain that you're you're seeking uh in terms of your business? So maybe that is not the right uh you know, research problem, uh a business problem to convert into a research problem, but rather something else uh would suit that better, right.

So uh and also there is uh this aspect of infrastructure. So how much bandwidth and in infrastructure do you have to support the research? Like there are data sets. Like if you're dealing with video algorithms, there is a requirement of GP us um and tons and tons of uh uh memory requirement, terabytes, basically of memory requirements in order to get uh the video data on your uh server and start training on them those models, they, they might be huge and they, they might not fit your production needs.

So all of this is um basically an understanding of the domain and that comes in defining the problem. Let me go quicker. So uh next step in the planning is basically POC which is the the proof of concept, right? So proof of concept um they are the once it decided that the problem is worth uh you know, research, um there is um you can still be a little bit um on the fence about it. However, you see, you know, majority uh of uh interest and majority of possibility in uh exploring it as research. The next step is proof of concept and that should start as early as possible in the in the project timeline, right, the moment you decide that this is a research problem, the potential research problem, uh you should start with the proof of concept, which means basically going and exploring the research um community uh and the research papers around the, the the actual task that you're trying to solve.

And that's why defining the task is very important. Now, when you know that your problem is actually concerned with object detection and classification, you are looking at papers which are concerned with object detection and classification and not going about, you know, everything else that's there in the domain because there are many and uh you might feel that all of them somehow correlate to your problem and they might, I mean, there's high possibility that all of them cannot translate to your problem.

So you should be very focused on the state of the art research paper and the exact base of the problem that you're trying to solve. And then there are multiple of these, there are very famous models that have come out um on each of these problem domains. Be it detection, classification, generative A I um it can be, you know, stable diffusion or anything that they are very um these are, they're very famous papers that have come out. Uh and they have pre trained models, most of them um had and still have a pre trained models which are available for you to try. Now, what is proof of concept in the research group name? It basically means identifying a few of the um research in the domain getting that, you know, you read that paper just to, just in an idea, you're getting an idea of what they have done. You basically get that pre trainin model. You try and understand how you can run that in order to be able to reproduce the result. And that's proof proof of concept. So you, you get the pre trainin model, you have a set of um data set that you want to evaluate it on and you see well, if this works or not, right?

If it doesn't work, uh exactly the way that you are expecting it, you know, that there could be something and that uh basically opens the path for you to explore further, like because there would be other paper that would have overcome some problem in the, the paper that they're exploring.

And then that's how you start the chain of proof of concepts. Now, um then there is an important part about the data sets. So once you have the proof of concept going on, you define the problem, you did the proof of concept, you're continuing doing the proof of concept, you're learning a lot of things about what is uh how to solve different pieces of the problem. You want what you want to put a lot of emphasis on the data set because once you have converted, you are planning to do research on the problem, you need to define what your data set would be like. Like what is your, so you plan to do research and build some A I model? But what data um are you trying to train it on? Do you have that data already? Because having data? Like if I want to train something on images, I have a lot of images, right? But if you're doing a supervised algorithm, you need labels for those images. If you want to identify, let's say let's take very simple example. If you want to identify furniture in, in images right now, you need to have images which have furniture in them, right? And then the images should know where the furniture is. So you need that level of data set.

So you need la labeled data set uh in order to do your supervised guard, maybe you want to do unsupervised. How can you approach those problems? First of all, you need to understand what kind of data set and do I have. And then depending on that, you can do a research in algorithm that suits that kind of data set, right? If you don't have data set, that is a lot of work that you can uh that is required in order to create the data set. And that is what the transition actually in Corporates. If you have a business problem, you want to convert into research problem, you need to invest a lot of time creating the right data set. And uh I cannot stress it enough. If you don't have the right data set, you're not solving the right problem, you are solving some peripheral problem and then your metric is not aligned. So uh having the clarity again, this is a subject matter expert uh opinion where you need to understand what is the kind of the data set that I need to focus on for say object classification, object detection, localization, segmentation. Um You know other machine learning tests, what kind of data set do I need? And what should be the scale of these data sets? That's also important because you cannot possibly have a very good model with just 1000 images, right?

So you need what should be the scale for a reasonable uh data set? And uh how can we get that done? So probably one of the um idea is that if you don't have the data set, then you work on the data annotation project. But you get the data set. If you, if you plan to go through the route of supervised algorithm and in for the business world, going through the route of um supervisor algorithm actually has, you know, um faster return because you have supervised algorithms are ba basically you're telling the model to learn from the data set, right.

So it is more accurate. There are less of complexity in the algorithm and you can achieve a higher uh confidence in smaller duration of the time. So that's why uh you can go through a data annotation project where you're basically getting the data annotated for yourself. You create the taxonomy, you create, what kind of uh classes do you want? What are the guidelines to annotate? And then you also, while you're annotating, you're trying to evaluate how good is the annotation itself, how good is the data set itself? And that takes almost around 3 to 4 months, you know, to um have a good data set ready. So that is the planning part and then there is deployment like you want to deploy these models in what is the kind of the environment that you want to deploy these models? Is it supposed to be cloud? Is it supposed to be on device? Is it supposed to be? How much inference time are you expecting? Right. What is the memory requirement? Because based on the memory requirement, you your modeling approach would change. What is the algorithm that you're trying to um optimize on um who will change? Right. So all and what is the data distribution that you're expecting? Right. So if, if I train my model on um cat images, all kinds of cat, but I deploy my model on horse images, uh it might not work, right?

So if, if the data distribution for the deployment is different from uh uh for, you know, use of your model, then the development of the model, then that that is a problem. And then the very part is also to consider about the maintenance, right? Once you have deployed the model, how are you trying to maintain it like uh or do you want to do like a regular um biweekly training because that would be newer data, right? So you want your model to be up to date. How do you want to um what algorithms do you want to opt uh to maintain them on? And then I'll quickly try to wrap this up in one minute and then I can take questions if you have. Um the other one is the last step is basically the evaluation plan. So once you have the model deployed, uh uh once you have to figure it out how you want to deploy the model, you still don't have the uh model deployed because this is the research conversion plan, right? Um You also need to plan about how do you want to evaluate the model? And there are both the quantitative and quantitative evaluation that you should be concerned about.

Like apart from metrics um to track and evaluate your models, you should also focus on how, what, what should be the um inference time requirement. And how can I evaluate it if my model is deployed on device which is expected to give results like every second, but it is taking like 10 seconds to give a result. Is that a good model? No, and we should be working towards uh improving that part of it. Um So different aspects of the model need to be evaluated, not just you know, the accuracy numbers, the precision and recall, but there are different metrics for different kinds of algorithms that you are involved in.

And also uh different kinds of use case that you want to put your algorithm into. You also want to evaluate on in any kind of class imbalance that might be there in your data. And how are you encountering uh or kind of combating that in your uh deployment? Uh Do you, do you have some kind of class imbalance um method in place for your algorithm? If not the pro the model won't be able to uh generalize very well across the classes. So those are things that you have to consider. Um how do you take different qualitative examples? How do you select different qualitative examples to evaluate your model on? And that is the requirement to know what you're expecting as an output, right? And uh also you can think about if human evaluation is needed, right?

Do you want uh you know like a set of 50 thousands, uh two thousands that should be evaluated with human to see if the model's output is at par with the human and if that's the goal of your algorithm, right? So those are a few of the important aspects under the evaluation plan. Uh I don't have much time left, but uh these are, these are just a few of the important things that I wanted to talk about when you're converting your business problem into um a research problem. These are the different uh absolutely must steps that you need to take uh which are defining the problem doing the P SAS. What is the data set, deployment, alignment and evaluation plan? Um And there is a lot more apart from this, but this, this is the fundamental that you need to explore at any cost. Uh I would now stop and take questions um in the next 67 minutes that we have. Um Yeah.