Spreadsheet Modeling by Rajneet Kaur

Video Transcription

Welcome everybody to sign series broadcasting an area that impacts all of us on a tool that all of us have at some point that year right now, like everything else in life, I would like us to start with the one. How is this section going to benefit? Right.So let's talk about time for whether you know it is any vertical like DS SI or um pre tail or pharmaceutical for that matter or on the other hand, functional areas like marketing, finance operations and that is really the unknown fact, right? Let's take a couple of examples.

So when it comes to sales, how does my next five year uh outlook look like he's impacted by a football? Let's talk about an eter, what would be the requirement in terms of workforce? How do I come for a treatment it's also impacted by? So and finally, let's talk about operation. How much do I need to increase the plant capacity? Again, there is hardly any area in business that is uninfected by both facts, all our careers and life also in the interview, right? So now let's talk about how it has been conducted recently, right? Um the tools and technology that there are quite a few things um uh fighting. Uh We have FB isa profit, et cetera and there are plethora techniques to do, right? Um Some of the ML techniques include um art gar a my family old, even deep blood, right? But what I wish to do that, we get our hands dirty and we work together on this um on Microsoft Excel, understand the basic concepts of forecasting because they're really simple and we will be able to take better decisions as well as negotiate better decisions for us at the end of the session, what you will be able to do is forecast something like this.

So in the graph that we see the first four years are historical data and the next two years are forecast. So you will also be able to do, let me con continue and um let's get started. Let's apply this session is for you, regardless of whether you are a student, you're a data and analytic professional. You may be a business professional trying to take better or even a technician. We are going to have lots of fun along the beach and while we are at this, let me also tell you a little bit about, I'm a senior manager in the space of a IML and at any point, I'm handling the same size of 30 the business of $2 million. Also, what I would like to add is that some of my areas of co include time series forecasting, um machine learning in general uh NLC inside generation data storytelling and problem solving. Recently, I also won the CA I Changemaker Award as well as the top end uh data leader award by women in ITHSC. One thing that I keep telling um the 100 plus people who have ever reported the news data science is as much an art as a science. And one of those arts is the art of the right? And I love telling stories. So this is how we are going to approach the best of the study. Uh We are going to solve a case study in form of a story.

So let me start with one thing. There are two central characters in my film. The first one is Eric. He's a sales and marketing head for North America for a leading US program and his go to person for all analytics with him. Who is you right now at this point before we continue the story in the case study, I would also like to call out that once in a while, I will just pause and check if there are any questions that you may have, right? Um I will check the chart window because at this point I'm not. Um And just in the event that I get dropped off, I've joined. But if I join, um if I get dropped off, I'll join the community. So let's continue with the story and one of the days and comes up with a very peculiar, let's see what he said would be. So the observation we can from this, but for Canada quarter, one and quarter two had performed much more than last year. But for the current year quarter three, it, it was higher, right? So for example, if last year quarter, one quarter, two quarter three paid for, let's say, 10 minutes, 10 minutes and 15 minutes this year quarter, two quarters are, let's say eight, but the last quarter is much higher than last year. Quarter three, let's say 20 instead of.

And why would that happen with this ability? After all, it is a consultant's job to find in any chaos a particular person and initially nothing may seem different. For example, pricing and marketing are very similar to the previous. And what was it? I would like you to hold that, that thought and let's discuss some concepts before we try to answer this question. I wanna start with, what movie is it going to a time period? Is um a data aggregated at a certain level. It could be a weekly granularity monthly, yearly, even hourly or daily, but it cannot be a bit different in that therapy. For example, I cannot store one hour of data and then one day of data and then one week of data and one month. All right. So for example, if I am storing um two years of sales data at a monthly level. I need to have 24 data points each at a monthly level for each of them. So that thank you. There are a few major components that a time series can be broken, right? Um Those components are the first component is strength. The second is seasonality. There is a third optional component called cyclicity, which is rarer and hence, you're going to keep it for today. And the last component is called irregular or random component. No, let's discuss with a bit. So let's talk about stress. Friend is the direction you give it to your lady.

It could be a line of that step and hence, it could be something that's le like you can see here. It could also be nonlinear like more of a circular um light, it could be increasing or even decreasing for that. The second component is seasonality, seasonality is essentially a pattern that so if you see I have four years of data here, OK? And each of these four years, there is a form and pattern you can see somewhere at the middle, it's kind of fee and then there is a sharp as well, right? So this entire pattern is what we call seasonality. And the seasonality period is the period after the for example, generally, I have seen the seasonality is one year. So the pattern are for ₹10 after the 52 weeks or 12 months. So on and so forth, right? Um There are also two types of seasonality, then it is weather driven, seasonality. It is called weather, seasonality. And the other type of seasonality, calendar, seasonality, which is given more by the calendar, right? Um We'll also discuss um a few examples shortly, but the irregular component or the random component is one of the most interesting components because it has no logic and hence, it can help answer quite a few business point.

Let's try to understand how each of these components can help us. He died, some didn't see it. So what we are going to do the first uh part of the session, we are going to conduct time series de that is we're going to decompose this time into its component, right? And that is a very good diagnostic tool because it explains uh various types of possible changes that happened in this. For example, why would friends? OK, friends can change for a variety of reasons. For example, um there may be some new competitors who might have entered the market in that stage, your friend is going to be new. On the other hand, there might be um your own product launches. In this case, it's going to, your friend is going to now let's what impact seasonality with, right? He's not really making a because of weather. For example, if you are selling a pharmaceutical drug that treats allergies and allergy treats during winter in such a scenario. What will happen is if this year, winter comes in earlier, even your drug cases start to feed earlier and that will result in a season allergies,

right? So just one concept that we need to co cover uh before we actually start uh decomposing the various components is um that there are two types of decomposition. The first is called multiplicative and the second is called additive, right. What we are saying is our time series data set in a multiplicative data set is a product of trend, seasonality and irregularity. Whereas in additive one, it is a summation of the three components, right? Um how do we decide whether to go for multiplicative or additive methods?

So there is a rule of thumb that I will share with you, right? So if, if, while your trend is increasing the amplitude in your data, the difference between the peak and the trough is also increasing and vice versa. So if your trend is decreasing and you see the amplitude of the uh data set also decreasing, we go for multiplicative one, right? Otherwise by default, it is additive. Now, in our scenario, I see that the data set here has a slightly decreasing trend and also more or less uh you know the peaks, the amplitudes is decreasing, right? So I am choosing multiplicative but end of the day, you can even choose additive because your goal is to minimize an error component, right, which we will also come to shortly. So um we are going to start off with the decomposition. And at this point, um three years of data is available for you to see that starts from Jan 19. And since uh this is an oe ma top OEM, they um work into infrastructure as well as sell laptops and desktops. This is three years of data for laptops, right? You can also see the data set here. And what we will do is first we will extract the trend, then we will extract the seasonality.

And like we said, since irregularity has no logic, whatever remains, we're going to call that irregularity, right? So let's start off. Um I have actually given the formula in the first row just to make this session slightly easier. But I would actually like to uh you all to think about it, right? What formula would you use? So what Excel function can be used to extract the trend of a data set? Let me just check the chat window if there is anybody who wants to give it a shot. OK? Never mind. Um So there is an Excel function by the same name called as trend. You can also use functions like forecast and forecast dot ETS, right? Um So there are three components or arguments within the trend function. The first is the known Y, the second is the known XIS and the third is the new X. OK. This is of the nature of linear regression, you know the way we did it in school that Y is equal to MX plus C. And here the Y is the output data set and the X, since this is time series, your X is going to be actually something that indicates time. So why we are going to take as the laptop sales and your X is going to be months here. OK.

Excel is this beautiful feature that internally even dates are stored as numbers. And therefore you can make that connection of Y equal to MX plus C, right? Um Also observe that I have frozen dollar four and dollar 39 here. Simply because when we drag this formula, we do not want these ranges, these array ranges to change. So it is going to be dollar four up to dollar 39 till the variant. So we saw what the first two components is, the last component is going to be the current month. So since we are in row four, we are going to use that. OK? I'll pause for a few seconds for you all to write this formula. OK? And once you have done that, we are going to simply extend this till the end. Perfect. OK. And when you scroll to the top, see the magic that you have done, your trend component will appear over here. OK? Now let's go to the next component. OK? Since we are saying that our time series data set is the product of trend, seasonality and irregularity. Then we divide time series data set by trend, we'll get the other two components. So that is all, we need to do a very simple division of the components. OK? And just drag the formula again. So now we have the other two uh components present. Now, let's talk about the seasonality. OK. What really is seasonality? It's a pattern that repeats, right?

So every J will have a certain behavior, every F will have a certain behavior so on and so forth. And if in the first year, Jan to Feb is an increasing number, a similar pattern will be seen in the other years as well. So we want to average all jams together, average all Fes together. But we want to do it conditionally, why do we want to do it conditionally because I want to take only Jans and only Feb and only March and break them into single, single, single buckets, right? So the condition in Excel is done through a if function and we are going to use average as well as if combined. And that's the average a function that we are talking about. So when it comes to average, if function, what we are doing here is there are three components or arguments again, the range, the criteria and the average range. OK. So within the range, we are trying to find out um the range here is an uh series that repeats between one and 12. Let me show you what we have done over here. We have extracted the month number from the fiscal month, right? So you will see that it runs from 1 to 12 indicating Jan Feb March. So on and then it repeats till the very end. So we are feeding to average a function that if in this range, you find any jam.

So it's going to find three ones or three jams. Since this is three years of data, it will pick up all of these three years. OK? Because the crisis area here is one and then it is going to take corresponding average range numbers from this particular array and give us an answer. So in this case, what it does is it takes this 87% this 90% and this 89% the average comes as 89% and that is what you get as the uh seasonal period. So if you just extend this formula to until the very end, you will see that there is a pattern that repeats. Wow. So we have the seasonal component as well over here. OK? Again, irregularity is what remains, it is going to be s into Y divided by seasonality. And therefore all you need to do is write this formula and extend it to the very


OK. So we have all the components present here. Uh I would also like you to observe that in multiplicative um seasonality. If you just take the average of all of the irregular components. It's always going to be one or 100%. OK. So I will take a short pause. We have done the magic, we have done the first step of decomposition and very shortly, we will also do forecasting with these concepts. But I'll just quickly check if there are any questions. Yes. Uh Y is S two I in percentage. Good question, Harshita. Um see the trend component, right? Look at the average of the trend component 43 5 12, right? And look at the average of your uh absolute uh numbers, your actual data set. It is also 43 5 12. Now um since we are dividing right, we are dividing a number that is similar in range. It is coming as a percentage. If you take additive uh uh seasonality, right? If you take additive decomposition, these numbers will not be percentages. But when it comes to multiplication, yes, these numbers are going to be, that's the nature of it, it's going to be a percentage, right? But it gives you enough information. For example, one of the points we discussed is if my irregular component, percentage wise is increasing, it's time to get suspicious, right?

So let me just, you know uh do a quick trend line here and you see that this number over time is not really increasing, it is more or less static. OK. So uh nothing to worry in terms of the irregular component here. Any other questions? OK, perfect. All right. So let me tell you what really happened in this story. OK. And for that, what I have done is uh we have plotted the trend component and the seasonality into irregularity

component. OK.

If you plotted these two components together, this is what you really see. OK. The trend is decreasing from 48 K to 38 K in three years time. And that's a whopping 20% drop. But look at what's happening with the seasonality. The seasonality peaks have started with an average of about 1 20%. But then they go on to increase still more of 1 30%. And these peaks, by the way, um they are the periods of Black Friday, Cyber Monday in um US, right? And equivalent to Diwali in India, right? Where there are periods of heavy discounts, there are a lot of promotions running around and therefore you have plenty of options to choose from. So this is a beautiful case of consumer behavior change. How so what's happening here is that since laptop is a very high price product, it is something that you know, uh and very often it is not an urgent buying uh like you would do for your groceries, right? People are speculating they are waiting for the offers of Black Friday, Cyber Monday where they can get something in one third of the price or maybe something. Sometimes they can get laptops at 1/5 of the price. And therefore the bind during the non events is lower, it's becoming lower and lower. While during the events like Black Black Friday, Cyber Monday people are buying more. So that is what happens in this story, right? And um that concludes the first part of our session. OK.

Which is how do you break uh a forecast or rather a time series into its components? Now, let's say that our ERIC comes to you and request you that can you forecast for a couple of years ahead so that I can negotiate my targets? Remember, he's the sales and marketing head. So there are certain targets and keeping this consumer behavior inside that we just learned he wants us to forecast ahead. So let's do that. Let's help Eric with the forecast. So there are some forecasting concepts that we will cover before we do the hands on for the same. The first thing that we need to know is what really is a forecast. Anybody would like to share what a forecast is? Oh, ok. There is a question. Got it. Ok. Should we just map Jan to December as in months? Removing the year? Uh Yes. Harshita. So for your um Jan to December, we have already calculated one column, 1 to 12, which is representing your month number, right? So you don't need the year. You are right there. Yeah. Um All right. A anybody wants to give it a shot if anybody wants to uh speak out. Also, I think there is an option. Yes, correct, Aishwarya predicting a value for a variable in the future. Correct, for example, sales, right, we are predicting the future, right? So we are foretelling the future. So it is often said that forecasting is the art and science of foretelling the future and then explaining why it didn't happen, right? Um but there is a difference and what is the difference?

That difference is that we are talking about a scientific way instead of foretelling that scientific way means that we can measure the accuracy. Ok. What really is the accurate accuracy? It is how close the forecast is from actual history, right? So when it comes to accuracy, we are talking about accuracy and error as two terms, both of them sum up to be one, right? So let's say that um I have forecasted 95 units and the actual it becomes 100 units, right? So the error here is is 5% and the accuracy is 95%. Now, how do we really find out what my error is? Ok. And for that, um there is a concept called train and test that we are going to use. So what we do and generally training tests are taken in 80 to 20 ratio, but we can take another ratio as well. Ok. So let's say I have 100 rows of history or 100 months of history for sales. What we will do is we'll pick up 80 rows out of that and build what we are going to call a model, a time series forecasting model on these 80 rows, ok? We are going to forecast for the rest 20 rows also where we have history. OK? But the model has not seen it, it is that 20 rows is unseen data for the model. And hence, we will get a forecast. We also have actuals for those 20 rows, what we are going calling as the test set and we'll be able to compare.

So like I said, if the average of those um of the 20% data set, let's say average uh you know, you are selling 100 units, but we had forecasted 95 units, then your error becomes 5%. We'll come to um implementing this uh shortly. But before that, there are literally a plethora of error metrics that exist. OK. I would encourage that you do research on them. Uh There is mad or mean absolute deviation R MS C or root, mean square error bias, uh MP or mean percentage error and even map or mean absolute percentage error, we are going to use the mean absolute percentage error today because it is an industrywide adopted metric, right?

And why is it industrywide adopted? Right. So of course, you should always question why is this something uh so popular or adopted by everybody? Right? Map does not suffer from two problems. Many other matrix may suffer from. Ok. And that's a problem of scale and sign. Um when it comes to scale, you know, your history, let's say it's between one and two or your history is between 102 100 units or it could even be between 1 million and 2 million. Either ways your output is always going to be, error is always going to be a percentage. So it does not suffer from different scales, right sign. OK? I see very simply in in forecasting, if I am uh let's say I sell mangoes, right? I cannot um overproduce in one month and then sell that in the second month, right? So um because it is a perishable item and therefore, if I have a positive 3% error in one month and a negative 3% error in the second month, which means in one month, I sold more in one month, I forecasted more. These two should not be allowed to cancel each other. And that's the problem of science. If you see this formula of map, it is actual minus forecasts, the difference, absolute difference between them. OK? So the sign goes away and then we divided by actual.

So it also becomes a percentage. So map is always going to be a positive percentage, right? So now let's get started, let's actually implement everything we have learned so far. And I'll before we move on back to the Excel a quick look if there are any questions. Ok. All right. So let's pull our Excel again and in the next few minutes we should be able to actually complete a forecast. Ok. The first thing I want all of you to do is unhide the rows between 39 and 64. So just right click and unhide the rows. Ok? And while you do, so let me explain to you what I have done here. OK? Um We actually did not have uh history for three years. We actually had the actual or history for four years. It's there over here. The fourth year, OK? And the reason um I had hit it is so that you know, we develop the model on three years of data and the fourth year we can actually use as the test set. OK? So I have three years of data on which I have built the model and hence it becomes the training set and I have one year of data on the test set. So ratio wise, my test and train is 75 to 25 which is very close to the ideal ratio. OK? So what we need to do here is we now need to recompose the components. First, we decompose a tr a time series into trend seasonality and irregularity.

Now we need to recompose them. And since this is multiplicative seasonal um seasonality and hence multiplicative model, the first thing that you will need to do is take your trend and seasonality

components and extend them till the

very end. OK. I'll explain to you what we're going to do with them, but

just extend

these to formulate first, just extend them for the next two years later. OK. Now, what we are going to do is since we said that our data set time series is the multiplication between um the different components. We are going to now multiply them back, right? So your forecast period over here should be multiplication of your strength and seasonality components. Um I have not multiplied irregularity here. Any guesses why we are doing that? Why are we not multiplying irregularity over here? All right, let me give you the answer. So trend and seasonality have logic, right? Uh There is a history and on the basis of that, we are able to individually forecast what the trend will look like and what the seasonality will look like. But there is nothing like that in irregularity. Since it cannot be predicted, we are going to leave it aside for our forecast using it is only going to increase the error of your forecast, right. So what I want you all to do is just multiply these two and extend it till the very end. All right. And that my dear friends before we measure the error, let us for a moment celebrate your first ever forecast. So based on these three years of history, you have been able to forecast two years. Congratulations and time to clap and pat on the back. Now, let's see how well we have done right because one of these years, the fourth year we also have actuals. So what how are we going to calculate the error? The first step, let's calculate the mean percentage error. Ok?

And what's the formula for mean percentage error is your actuals minus your forecast within brackets and divided by actuals that will give you the overall mean percentage error for that particular month. Ok? And then we will simply extend this formula for the one year in which we have both the actuals and forecast. Ok? Uh the difference between mm pe which is mean percentage error and ma pe which is the absolute version is that you take absolute value in it, right? And that function in Excel is called apps. So we take apps of M pe, ok? And just extend it once more, right? Um Remember we said MP does not suffer from sign issue, right? Look at the average of M pe and you'll see it comes as minus 1% over here, right? Whereas ma so, but basically what is happening, right? Uh um 7% and a minus 3% there will be some amount of canceling out. So it will be seven minus 34 and divided by two. So it becomes 2% error which is not always true, right? I cannot um one month, I cannot bring like we said, extra mangoes and consume it in the second month, right? Uh Let me uh take the average of map, right? So at the very top I have also we have also some space. You just take average, use average function of these two values. And you'll see that we have 3.4% P error.

Quite frankly, in the ML world, 3.4% error is excellent error because it means that you have a 96.6% accuracy. All right. So that is the forecast that he will help Eric with so that he can negotiate with his own management, some better targets, right? Um And before I stop, what I would also like to share with you is um uh a free coupon of my udemy course on linkedin. So I will paste that as well uh for you. And now I am happy to take any questions that you may have. Yes. Uh Harshita. If we choose additive model, the error could have slightly been different. And uh what ideally we would do, right? We would try both multiplicative and additive uh methods and see where our error is lower. Generally, that is what we would select. So in uh Eric scenario, additive would also have worked and I would encourage that you definitely try it out, right? And um yeah, there is an optional five minutes extra story or case study that I had store in case we have lesser questions, but I'm happy to first look at some more questions. All right, never mind. So let me take you through a seasonality related uh story. And that story is really close to my heart. In fact, uh my favorite component um in Time series is seasonality. Uh There is a lot of uh in business, things are never um you know, as scientific, right? Um That is what we are going to see.

Now, let me share my screen and uh so let me take you to August 2014. Ok. I was uh comparatively fresh into Time series forecasting at that point. And I was handling the Netherlands country engagement for a top pharmaceutical client, top five you know, pharma companies in the world. And um we were forecasting for the drugs that generate 70% of its revenue, they treat autoimmune diseases. Um you know, like Crohn's disease, ulcerative colitis and very very expensive drugs, right? Let's call them Chemica and Dim. Ok. Now with Chemica and Dim, what was happening is this, this is one year's uh of monthly data and you'll see that every third month was speaking. Ok.

And for some reason, the forecasting engine that we were using so called sophisticated engines were not picking up this pattern. Ok. So what I did is I spoke to the demand planner tom at that point and I said that um if the forecasting engine is not picking up, let us separately induce this pattern because we can clearly see it is repeating so many number of times, right? So the Tom Tom the Demand plant was very happy about the suggestion and we did that. OK. So like you can see right? Uh the next week. So this is July, right? The last peak happened in July. The next peak is expected to be happening in October 3 months later, right? What happened? Let me tell you the next peak is to happen in October 2014,

but it

happens in September, both these drugs peak. So they happened a month earlier so far it was every third month, but now it is the second month, it suddenly happened, right? What was amiss? Why did this happen was something that really joggled off all of us, right? Uh It was difficult to understand in the beginning and then, ok, let's guess, let's try to guess the answer lies somewhere on the calendar. Ok. And um I want all of you to take a few seconds and think what could possibly be happening here. And is this even seasonality from whatever we discussed so far? Would you have done? What I did induced, what I thought is seasonality induce that pattern explicitly. So let me tell you what had happened, right? As Tom and I discussed later what um you know, this company, it used to accept orders on every Monday. Ok. Let me just zoom this in a bit

for you on every Monday

and the shipments for them would happen on Tuesday. So they would actually send across the pharmaceutical drugs on Tuesdays. Ok. So naturally any month that had five Tuesdays will be peaking, it will have five weeks of sales instead of it would be storing four weeks of sales. Right.

And so far in the year that was happening on Tuesdays, ok. April July. And then we instead of October, the next uh you know, Tuesday five, Tuesday month was actually September. And uh in fact, October had five Wednesdays instead of Tuesdays, this is also called as the 445 effect because roughly every third month has an extra weekday, right? Like in this case, we are talking about Tuesday. So every third month, we'll have an extra Tuesday, roughly every third month, we will have an extra Wednesday, right? So the lesson learned here and something that I'm going to remember for life, right? Um Is that if the granularity of the forecasting period here, the granularity of forecasting period was monthly level. Ok. And the granularity as at which business transactions happen, for example, the shipments were happening at a weekly level here. So if the granularity of these two things monthly and weekly, in this case, do not match, then there might be peaks that we may observe and that is not really seasonality. So the key lesson learned here is first of all, start with the rational, why should there even be seasonality. OK? Is this, let's say, for example, if it was an allergic drug, we would clearly say that, ok, it's winter allergies peak during winter.

And therefore the sales will also be and it will show certain kind of seasonality, right? But that is if you cannot justify why there needs to be seasonality, first, find out the business logic and then apply the science, the forecasting science. So that was my time with all of you uh really enjoyed and um I am happy to take any other questions as well in the next five minutes or so. Thank you very much. Thank you um Harshita Aishwarya and everybody present in the session. Uh I I will not be sharing the presentation but um something very similar I have uh on my linkedin article. So what I could do is I'll place the link for my linkedin articles. In fact, there are five forecasting related sessions that I have and uh you can go through them. Yeah. Thank you. Thanks Rima. Hope you all enjoyed the session and learned something today and I really hope that you all are able to use this ahead.