I know where you wanna go next summer - Demand Forecasting at FlixBus

Automatic Summary

Understanding Demand Forecasting in Network Planning

In today's fast-paced global market, demand forecasting is a critical process for companies striving to maintain optimal supply chain management. My name is Miriam, and as a domain product officer at Flix Bus, my mission is to leverage data and AI for efficient network planning. I'll be addressing misconceptions around demand forecasting and sharing my journey as a data scientist in the peculiar industry of network planning.

The Role of Demand Forecasting in Network Planning

At Flix Bus, we have five product teams dedicated to network planning. They serve as our internal customers, and we deliver products tailored to satisfy their unique needs. The primary focus of my talk today would be the critical role of demand forecasting in network planning.

Demand forecasting isn’t about directly targeting individuals; it’s about discerning potential passenger flows between cities. Our aim at Flix Bus is to provide an automated, scalable solution to ensure we’re able to meet the travel needs of various passenger flows at any given time and season.

The Complexities of Network Growth

Determining which connections to serve and how many buses to allocate for a specific connection, at a given time, adds significant complexity as the network grows. As we have expanded our operations to cover the largest long-distance bus network in Europe, the US, and Turkey, manually optimizing and expanding our network became untenable.

Leveraging Machine Learning and AI to Solve Real-World Problems

Our approach to these challenges is to employ machine learning and AI. Our goal is to understand key factors that influence demand in time and space. For instance, density of population and distance between two cities prominently influence the travel demand. The closer cities are and the more populous, the greater the demand. And this is just one example among many different variables.

To effectively leverage our data:

  • We source required data externally in addition to using our internal data.
  • We examine factors that influence demand and build several models to account for space and time.[/li]

Key Learnings

Data Groundwork

When addressing complex issues, one crucial yet overlooked aspect is the data groundwork. We've learned that high-quality groundwork is pivotal to derive valuable predictions from models. Any garbage data input into a model would invariably result in a garbage forecast. Hence, close attention to data quality is critical.

Exploration and Iteration

Much like climbing a mountain or running a research lab, deploying machine learning and AI to solve real-world problems is exploratory and iterative. To find the best results, you must test many factors and candidate models against their forecast performance.

Taking Stakeholders Along

Machine learning models do not exist in isolation. There's a constant interaction between the machine, producing a forecast, and the human reviewing it for decision-making purposes. Therefore, creating trust and acceptance among our stakeholders forms an integral part of our development journey. We also lean on methods from explainable machine learning to shed light on our often 'black boxy' models for our users.

Conclusion

In conclusion, notwithstanding the challenges, the journey is a thrill for us. Demand forecasting and machine learning provide us with an exciting opportunity to reshape the face of network planning, ensuring we thrive despite the complexities of our growing network. Thanks for your time!


Video Transcription

So good morning everyone. Um I'm very happy to be here and to talk about demand forecasting. My name is Miriam. I'm a domain product officer at Flix Bus.And um yes, Anna said I have a background in statistics and then worked many years as a data scientist before I joined Fus and at Flex Plus. Um I have been working with my teams um on automating uh network planning in a smart way with um data and machine learning and A I and that what's network planning, basically network planning is responsible to plan our network worldwide and to build the schedules for our buses and our train network.

And um in the network planning domain, we have like five product teams and we build products for network planning. So they are our internal customers. And what I'd like to talk about today is demand forecasting. And actually um the title of my presentation, it's a bit misleading and um I'd like to explain you why and what demand forecasting actually means in our context. So well, of course, it's, it's not me, but it's a whole team driving the forecasting effort and we don't know. But um we forecast and forecasting um or every forecast is subject to uncertainty. And also we are not interested in you as an individual, but um we are interested in potential passenger flows, so passenger flows between cities. So don't worry, we do not track you or anyone.

Um And next summer where that refers to our planning horizon. And yes, our planning horizon is quite long. So we plan to like right now, we plan for next summer, but of course, not only for summer but for all seasons. And what I would like to cover in the following is like, what is the problem that we aim to solve with demand forecasting? Um And, and then second, um give you a glimpse at how we approach the problem. And then most importantly, I would like to share a few learnings and um that we made on the way. And um my aim is here to like create awareness and um I hope that this might be useful for you because probably also in your context or in your company, um you are faced with similar challenges um like when applying uh machine learning and A I to real world problem. And although your use case might be different, I think there is a lot of like common patterns. So what why do we do demand forecasting? Like what is the problem that we aim to solve? We went from just a few connections in Germany to Europe's largest long distance bus network. And we are also operating in the US and Turkey and like starting off, you can still keep track of a few connections and optimize like manually and expand your network manually.

You look at all the information that you have at hand and then combine that with your planning expertise. But as the network growth, complexity increases and we need to make a lot of decisions like which connections shall we serve? And how many buses shall we plan to supply on a certain connection at a certain point in time? And what you see here in the next slide as an example is all the potential connections out of Florence. So this is now only for one origin city and it gets quite overwhelming if you try to plot that for the whole network. Um But basically, what you can see is that we need to make a lot of decisions on the supply side um for a network of like 30 million possible connections. And on the other side, there's of course, only a number of buses that is limited and not all connections are equally attractive to our customers. Um So for instance, I would expect more customers going from Florence to Rome than going from Florence to Munich. And also there is quite some fluctuation and demand over the course of a year or a week or even within a day.

So what we need for our purposes is we need a forecast in space and in time and we need, given the size of that problem, we need an automated scalable solution. And then based on that, we can plan our supply and our network. So what is demand, forecasting demand for us is the sum of our potential passengers that would like to travel between city A and C DB by bus in a certain time period. And forecasting is predicting something in the future. Now, of course, we cannot foresee the future. But what we can do is we can make a best guess, taking all the information that we have at the current point in time into account all the data that we have right now and then um make a forecast um about the future. And in that way, demand forecasting, it's a tool for decision making and it's like one piece in a larger decision making system. So on one hand, we have demand, on the other hand, we have buses, our supply and we need to make a decision how with our supply, meet demand and that decision needs to be made well in advance way before the buses are on the road. Um and demand forecast for next year, they feed into like the decision making system and in that way, they contribute to efficient and effective network planning and they ultimately they make a business objective happen. So how do we approach the problem?

And um here I'll only give you a glimpse because 20 minutes are way too short for that. Um But we started off with a question. Um what are the factors that influence demand in, in space and in time? And now focusing on the space dimension? What are the factors that make a connection attractive to travelers? And I will give you one example, where would you expect more passengers on a connection from Florence to Rome or a connection from Florence to sound? Uh At least I would prefer to go from Florence to Rome, traveling three hours instead of going 19 hours to sand in France. Um And um also, of course, like Rome has slightly more tourist attractions than sand. They also have one amphitheater and a camping ground. But a part of that, I think Rome is way more attractive and like in both cities, Florence and Rome, there are way more people who would like to travel than compared to sand. Um So this gives you an idea of a few factors that influence demand. And I summarized like the few that we touched on that slide. So the population of two cities and the distance between the two cities, they are factors that influence demand. So the bigger the cities, the closer they are together, the more demand we expect. And then tourist attraction is one of many other factors that come on top of that and influence demand. Um So what we realized is that like many of those factors that influence demand.

It's not data that we have at hand, it's not data that we use ourselves, but it's data that we need to source externally. So we need to discover data sources and then collect the data. And luckily, there's a lot of open source data, for instance, um from government agencies and uh geospatial data um is um provided by openstreetmap, which is like crowdsourced geospatial data. And then once we have the data, we need to start with the modeling. And here what we did was we kind of took that big problem and broke it down into like several sub problems I already mentioned like the space and the time dimension. So basically what we did is we built several models, not one and combined them together in one forecasting system. Like this is now like a very, very, very simplified overview of the approach, but basically there's data and that data that can be factors that are externally, but it can also be factors that are decision variables on our side and they feed into this like forecasting system and like this system of models and then that produces a forecast.

Um Now, after this like very simple, like glimpse of the approach, um I don't want to go too much into the technical details. Um But what I'd like to focus on is to share some real life experience and some learnings of like this, this journey of applying machine learning to a to a real life problem. And I would like to share with you what we have learned on that journey so far. Now, first of all, what we found very important is the data groundwork. So nobody wants to do data work, everyone wants to do the fancy machine learning work. But actually the data work, it is super important. And I would like to give you 11 example of why it's important. Um But basically, like in general, if you feed like garbage data into your model, no matter how good that model is, what you will get is a garbage forecast. Um And one example um that I that I that I have for you is um on um open openstreetmap. So we leverage um data um from openstreetmap and that data is crowdsourced. Um And in that way, it, the data is reflection of the interests of those people who create the data. And now if you ask, who creates it where like for this audience, probably very interesting.

There might be a slight gender bias um in a survey from 2020. Um They found that almost 90% of contributors identified as male. Um and also um where we have um like a bias is in um the like spatial distribution. So the majority of contributors comes from Europe and that's actually something that we can see in the data. So for instance, for Germany, um a European country, even small cities, they are covered in great detail in openstreetmap. However, if you don't go to the US there, it's far less detailed and these, these data gaps dismissing this, that is not random. And that's something that you have to deal with in the modeling because otherwise you'll get bias forecasts. Now, the second learning that I would like to share with you is on the experimental nature and the iterative nature of the development process um which I think holds true for many machine learning and A I efforts. And um I have like two metaphors for you. It's, it's like a research lab where you, you have a lot of iterations where you first, you come up with ideas and hypotheses, then you test those ideas, then you evaluate the ideas. And if the evaluation shows that that idea improves your model, then you incorporate that into your forecasting system and you don't know in advance which factors have predictive power. And you also do not know in advance which model type will work well for your particular problem.

So you have to try out many different factors and many different candidate models and evaluate them with respect to their forecast performance. So it's not that you build one model, but you build hundreds of candidate models. And it's also a little bit like climbing a mountain where you don't know beforehand how high that mountain is, how high you have to climb. And you have an idea on the routes that go up here, but you have to take the routes to then discover that some of them are dead ends or um that some of them, there is like some surprises behind the corner. And uh I actually one surprise that hit us was um the pandemic. Um The pandemic created a structural change with the lockdown which had a massive impact on the travel industry. And that impact manifests itself in the data as so-called structural break. And that's now something that we have to deal with in our modeling. So when we train our models on historic data, or when we evaluate our models on historic data, we have to take into account that there was this very special period. Uh The third learning that I would like to share with you is on taking, taking our stakeholders along. So in in our case, and I think that is true for many other use cases, the model, it doesn't live in isolation.

So there's an interaction of a machine that produces a forecast and a human. So the forecast, they are just one piece in a larger decision making system and humans look at forecasts for the decision making and they might inject domain knowledge and strategic targets also either into the forecast or in the likes subsequent supply optimization. So ultimately, it is still humans that are responsible for the decision making and they need to build up trust in the model. They need to trust the model to be willing to base their decisions on the model's recommendation, otherwise they won't use it. And if they don't use it, the forecast do not create any value. So what we found helpful to do was to involve our users and in our case, it's internal users. So our network planners um involve them early on um to create acceptance and trust. So we basically made them part of the development process. Um And it's not only important to create that trust, but it also helps us to improve the model by incorporating their domain expertise. So what we found helpful to do is to apart from the like standard like rigorous model evaluation process um to also do user feedback rounds interviews and testing phases with the users.

And that helps us to detect ot um so places where the model by might be off and then with their domain knowledge, the experts, they can help to contribute ideas why a forecast may be off in certain areas. And a second um point that we found um important in order to to um kind of take our our users and our stakeholders along is to leverage methods of like so-called explainable machine learning. And that helps us to shed some light on the rather black boxy models and explain the user why forecast chose a certain value. So to summarize, um let me briefly restate the three learnings that I shared with you. Um First, the data pro work is very important um at least as important uh as the modeling itself. And then the experimental nature um of the development process creates some additional challenges. And that's why it's very important to take your stakeholders along in that inner journey and create trust. Um but also learn from the expertise in the iterative development process. And I mean, yes, there are challenges um but there are also large opportunities um and um the journey um it's super exciting and um yeah, I'm very happy to um be part of, of that journey or of that bus ride and work with my teams on, on solving those challenges.

So yeah, thank you very much for your time.