Data Modernization - Challenges vs Opportunities

Pooja Kelgaonkar
Senior Data Architect
Automatic Summary

Embracing the Data Modernization Journey: Challenges and Opportunities

Good morning, everyone! Welcome to a session where we break down the data modernization journey, discussing typical challenges and opportunities. My name is Puja Ganar, a Senior Data Architect with Blacks Space. With over 16 years of experience in the data domain, I have a strong passion for sharing my insights with the community through blogging, public speaking, and authoring books.

The Need for Data Modernization: Delving into the Opportunities and Challenges

The need for data modernization has never been more acute. The exponential growth in data needs and requirements to support structured, semi-structured, and unstructured data are among the key drivers of this journey.

Challenges in legacy platforms such as performance issues over time and limited data support typify the urgency for a shift. Additionally, the high operational costs and budget considerations associated with maintaining these platforms add to the burden, necessitating modernization.

From Challenges to Opportunities

Each of these challenges presents an opportunity for modernization. For instance, scalability in a cloud platform can tackle performance issues, effectively turning this challenge into an opportunity. Also, rather than shouldering hefty maintenance costs, on a cloud platform, the vendor typically takes care of upgrades and patches. Hence, you only cater to app maintenance costs, driving down overall expenses.

Typical Modernization Journey and Its Phases

The typical modernization journey consists of five transformational ways: Rehost, Revise, Rearchitect, Rebuild, and Replace. The journey itself goes through five distinct phases: data discovery, assessment, architecture and engineering, migration and testing, and finally, the go-live stage coupled with data operations.

Modernization Assessment and Evaluation: Taking the Right Metrics into Account

  1. Data model mapping: Understanding how the data model will transform in the transition from the legacy platform to the cloud platform is critical.
  2. Application integration: Maintaining interdependencies among applications while transitioning some applications to the cloud, and some remain on legacy systems, can be particularly challenging.
  3. Third-party adoptions: Balancing performance and cost efficiency becomes a significant challenge here. The policies for scaling, implemented for performance, should not result in exceeding the budgetary limits.

Overcoming Modernization Challenges

Certain strategies can help overcome these challenges. Implementing checkpoints throughout the modernization process at distinct stages, beginning, on-going, and assessing, can keep these challenges in check. Automating operations, cost monitoring, and compliance checks using managed cloud services streamline the process further.

Conclusion

In a nutshell, the journey of data modernization holds vast opportunities, alongside challenges. Every aspect of this journey demands attention: the five phases, managing the challenges, and turning them into opportunities. I am eager to answer any queries that have surfaced during our session today — do feel free to interact in the chat or Q&A. Let's together unravel the complex sphere of data modernization. Thank you for joining in, and have a good day.

About the Author

Puja Ganar is a Senior Data Architect at Blacks Space. With more than 16 years of experience in data domain, she uses her expertise in GCP and Snowflake to modernize data infrastructure. Apart from being recognized as a Snowflake Data Support Hero, Ganar is also an avid blogger and public speaker. She's currently authoring a book, offering yet another platform to share her vast knowledge in the sphere of data modernization.


Video Transcription

Good morning, everyone and welcome to the session. So this session, uh we are going to talk about the data modernization and what are the typical challenges and opportunities in the modernization journey. So we are going to begin the session with my introduction. So I am Puja Ganar.

I have over a 16 plus years of experience into data domain. I'm working as a senior data architect with Blacks Space. I'm working on uh primarily on GCP and Snowflake. Recently, I'm about it and recognized as a snowflake data support hero this year. So as uh I'm a data enthusiast and as a data enthusiast, I love, I love contributing to the community. So I'm also a blogger, a public speaker. I'm a, I'm also authoring a book. So uh in today's session, uh we are going to talk about the overall modernization. What is the need of a modernization, what is the journey, what are the opportunities and what are the typical challenges? And we are also going to go through a sample use case and where uh towards the end of the session, we are going to talk briefly on the challenges, challenges versus opportunities and some of the checkpoints that we need to use to ensure that these opportunities doesn't convert into challenges.

So each of these opportunities may uh you know, have a potential content to convert into a challenge and it becomes uh challenging to maintain the application on the cloud platform. So we are going to talk about that as well. So uh these are the some of the legacy challenges that we know. So when we talk about any of the legacy platforms, we definitely have a performance challenges. So over a period of time, uh when we pro so I'm talking about the legacy platforms where we uh we we used to precalc where we used to uh pre calculate the capacity and procure the hardware and software. So over the period of time, we see a declining performance, it also has a limitation of a type of data it supports. So it has a limited type of uh type of data support where we talk about the growing data needs and requirements to support the structure, semi structure and unstructured data. It also has a limited or lack of insights where uh it has limited uh limited opportunities to integrate and have the AM L or predictive analytics integrated along with the platform. And the last one is the operational cost and budget issues.

So where we talk about the uh overall maintenance cost that we need to involve as a part of maintaining the platform. So remember the days where we had uh ops team or the support teams or the DB A teams who were maintaining the platform where we, you know, we need to in invest the cost to ensure that the application is up and running your infrastructure is up and running all upgrades and patches are being done on time.

And uh on top of that, uh there is a support team who used to monitor the or maintain the infrastructure so that all your applications are up and running. So these are all some of the legacy challenges that we overcome with the modernization. So each of this challenge has been converted into a modernization need. So talking about the performance, the performance led to the scalability where instead of ha ha having additional procurement done, when you move to the cloud, you get the opportunity or option to have a horizontal versus vertical scalability where you can scale up your applications to support your uh to meet your performance needs.

At the same. Contrary to the operational cost where we had a dedicated team to maintain our applications and the infrastructure in the cloud journey, we do not have to do that because uh the vendor itself provides the upgrades, patching and all the maintenance activities are being done, taken care by the hyper scalar and we you just have to use the managed service.

So that cost is entirely become 00 in terms of our platform maintenance cost. However, you need to have the maintenance cost in terms of your application, maintenance. So talking about it, the pipeline monitoring and maintaining your application here. Uh not, not related to infrastructure but related to your data applications. Ok, then the the availability and the cost and budget. So obviously with the cloud, if you can have a pay as you go go model, or you can also have a commitment are the discounted rates based on to how the Hypercar offers. And if you would have the cost estimated and you would end up paying it as a more of a operational or the Opex way than in comparison with the Capex way, which was the earlier case, then the typical modernization journey, it uh we know that it has a five hours. So when we talk about five hours, that means these are the five ways of a transformations, reho repa uh revise, rebuild and replace. So these are the modernization ways, all the transformation ways we can go for. And this typical journey consists of five different uh phases and these pages are nothing but uh data discovery uh discovery leads to the assessment and architecture where you we assess the existing platform uh followed by the architecture and engineering web.

The assessment helps to design the application in the cloud and then followed up by the development journey where you actually move or migrate your uh applications to the cloud and start testing them. And the last one is the go live and data ops where you end up uh moving your application live to the cloud and uh towards when it starts uh live, you would have it a parallel run. Then the pro production panel run in comparison with your legacy system and your cloud application so that you match them into it and see them, see that the application has the required performance, it meets meets all your validation criteria and also it has all the consumer needs or the activation is done well.

And once you have all those checkpoints validated, you, you end up doing whole live of your application followed by data ops. So data ops is nothing but uh support, support activity that you need to implement to monitor and support your application or data pipelines. This can be a automated as well as in some of the cases, a amount information may be required to ensure your data applications done up and running all the time. Then the evaluation. So of course, when we talk about the modernization, moving all these applications to cloud, the evaluation is also important. So how much you are spending on a cloud when you go? Right. So most of the cases where I worked on various use cases, and this is the common challenge that I have observed this over a over a period of a time even though the application is up and running for more than six months a year or more than a year. The cost is still not stable. And that is one of the common challenge in the modernization journey. So we we can start with the journey considering that the cost would be stable or we would have the maintained cost over a period of time on the cloud. However, based on to the application design based on to the application needs or the way it has been operational, we don't see the flat uh flat cost over a period of time. We do see the trains or the spikes.

So this is where the cost Capex versus Opex come into the picture. So we have to evaluate how the expenditure are and how your performance and the cost efficiency is being done. Is it hand in hand or in comparison to meet your performance needs, you are having hit on to your cost efficiency. So incorrect scaling policies or inappropriate uh policies defined may lead to the higher cost and may lead to the uh hit on to your budget that has been defined. So uh with this, uh let's talk about some of the assessment metrics. So what are the metrics that we need to consider when we talk about the modernization assessment? So as we have discussed, there are five phases, the modernization. And according to me, the assessment is the most critical phase where we need to assess the application which has been developed over a period of time. So let's talk about enterprise which has been developed over a 1516 years. And now we are we are assessing that platform and coming up with a cloud architecture to move the enterprise warehouse to the cloud, right?

So in that case assessment places critical role the moment you miss on to any of the small cri criteria or the scenario or a metrics on the legacy platform. It turned out to be a challenge in the cloud implementations, right? So your design may not turn out to be the right or you miss on to any of the aspect while you're designing or architecting and say leads to the further phases of the modernization. And that's where we end up uh end up not matching the failing the validation cycles and the validation and the testing phase is a kind of a prolonged phase. So this starts at the very beginning and these are some of the metrics which we can consider when we talk about the assessment phase of a modernization. So uh talking about the opportunities. So obviously, when we talk about moving these uh platforms to the cloud, these are some of the opportunities that we can see the cloud oriented architecture. So obviously, you you would have the scalability uh efficiency, then a flexible costing model, then it also has a strong governance.

So you can implement the governance and the compliance model into an type from your uh uh infrastructure to your data policies. Then it has a innovative mindset. So unlike to the legacy platforms, where you had a limited set of uh infrastructure or the software and the technologies where you had challenges to integrate it with the heterogeneous sources or heterogeneous data and uh uh and the integrating A I and ML capabilities, right?

So those are the things uh are already open in the cloud platform. So you get that opportunities and instead of focusing on how to be developed, you focus on what to be developed, right? So you can have those innovative ideas or the features. So your focus completely changes from the development to the future oriented implementation. So it is it becomes very much easy to integrate and implement any of the features that you want to integrate it with your application. Then these are some of the typical challenges. So when we talk about migrating um or modernizing a warehouse or a data platform, these are some of the typical challenges that that you may encounter. So data model mapping. So how the data model will be done? What are the kind of uh data types that I can map? Do I need to do redesign entire data model? How it impacts to the data pipeline? Whether uh I need to redesign the engineering, the platform, how the ETL versus ELT would be should I go with the lift and shift? Or if I go with the redesign, what are the components that goes goes under redesign to data model or the pipeline re engineering, then the application integration.

So this is the mo the most important where you are just moving part of the applications to the cloud and the rest of your applications are still on the private cloud or the legacy systems. Then integrating the the interdependency between these applications become challenges challenging.

So in that case, uh I have a so that most of the implementations they still keep on using the enterprise uh scheduler so that the applications would still use the same scheduler and their interdependency becomes easier to manage in comparison with having entirely uh moving and the application piece to the cloud, then uh the third adoptions.

So the scaling versus performance versus cost. So as I mentioned, ensuring the performance and implementing the right policies should also be in line with the cost efficiency. Otherwise the cost becomes challenging and the cost would continue to grow up in co in comparison with the performance as well.

So this has to go hand in hand, this is the sample use case. So this is the terra migration uh typical enterprise data warehouse use case where everything is been uh integrated with the terra, you can see the heterogeneous sources, uh heterogeneous sources being integrated. Uh The entire processing is been done using a elte le the terra data part. And then as uh target layer where the consumer applications can be very well integrated and consume the data. So considering this modernization use case how a typical enterprise can be modernized and moved to the GCP and a snowflake. So this is the GCP use case where Bigquery is being used to replace the enterprise data warehouse. And this is this use cases of for the data to Bigquery migration where the heterogeneous store would still be there, the consumer applications will still hold true. And once you have your data extracted from heterogeneous sources, bring them to the cloud buckets and then integrate it with Bigquery and entire ELT would still hold true. So whatever ELT processing you had it on the platform have the similar pipelines run on a Bigquery as both of them supports and standard the conversion and running those pipelines becomes easier. The same goes with the snowflake.

So this is a terra to Snowflake um use case where Snowflake is being designed as an enterprise data warehouse. The same holds true here. Uh considering Snowflake as a platform where it is not uh it is not a kind of a vendor locking of uh opportunity where if you see the Bigquery, Bigquery is a kind of a vendor locking. Bigquery is available only on a Google cloud platform and you have to use the G CP relevant services. However, Snowflake is not a kind of any any hyper scalar locking, you can have it hosted on any of the cloud. And at the same time, you can also very well integrate it with any other cloud platform. So for example, let's say your sources on AWS S3, uh one of the sources on GCP bucket and you can still integrate them on the snowflake platform which may be hosted on a ZRR AWS or a GCP. So depending on the business needs, we can very well have it integrated and as it also supports the equal standard, and uh we can have the el implementation done here as well.

Then if you consider the sample use case of a atw migration, these are the opportunities that we we can see when we talk about uh terre migration use case. So obviously the performance. So though Terada is a shared nothing architecture and it is a MP P implementation, the I remember when I used on the appliances on the appliance model of Terada, we had to always ensure that we have right amount of data and we are running the timely monthly roles and archival to ensure the data and the performance meets and we are still meeting our um daily batch SLS and at the same time, consumer application SLS as well.

So we have to always run on ongoing jobs for monthly roles ar I even collecting statistics so that the performance is up to date then scaling as we know that the writer uh even though it is A MP P, it has a limitation depending on the number of notes are added to your appliance.

So we, we definitely don't have the more scaling opportunities in com comparison with snowflake and the big query where you get more skiing opportunities and the data processing. So yes, uh unlike like I uh the A V and V offers support to the all different types of data types like substructure and semis structure. However, uh considering the legacy or the appliance model where we had this limitation and had to use some of the hybrid approaches to integrate the structure semi structure data. Uh These are some of the challenges when I talk about the modernization use case at the migration use case.

So I have faced issues in terms of um data model conversion, the numeric uh or float mappings, the time stamp convergence. So beque saves time stamp with the time stamp zone. So that was one of the challenge I faced when, when we have to compare the time stamp based data and have the right partition data saved on to the using a time stamp columns then uh data audit processing. So uh Parata supports A and oil tp uh kind of transactions. It also supports the to level operations. However, Bigquery also supports role level operations that uh still it has some of the limitation in terms of the number of operations done uh the number of concurrent operations done at the role level. So these are some of the challenges that we we encounter when we talk about the data to be modernization. So considering the sample case. And then the rest of the uh modernization challenges, we we have uh let's talk about some of the overall modernization challenges. So your data model convention, your historical data migration. So data migration is one of the biggest challenge when you have to migrate the petabytes terabytes of the data from legacy to cloud.

So how we can leverage the offerings to mi mi migrate the data now, the challenges versus opportunity. So each of these opportunities may turn out to be the challenges. So for example, the scaling scaling might turn out to be the challenges in terms of cost efficiency. So how we can implement tagging budgeting to ensure that the applications are cost efficient? Then uh two main in your application and ensure that it is not turning out to the challenging situation, you have to implement some of the checkpoints. So the checkpoints can be implemented at a three level before you begin. So the moment you start with the assessment, then when you begin, so assessing and coming up with the requirements and the design and the last one is the ongoing. This is the most important one where you have automated operations, automated cost monitoring, automated logging and monitoring and ongoing compliance checks. So these are some of the automated checks that you can very well implement using the cloud manage services to ensure that your applications are meeting up your performance needs and your budgeting needs. So these are uh these are the uh checkpoints and the modernization challenges and opportunities that we encounter on a day to day basis.

So I will take a pause here with this and uh see if, if there are any questions, feel free, uh feel free to drop your questions in the chat or Q and A. Let's utilize a minute's time here to do any. I hope this uh session is helpful. Um I had to rush a bit based on the time and the as in that to cover. So uh I'll uh I will check and definitely um I'll be happy to share, so I will check and share, um share it with you all. Any other questions? All right. Thank you so much for joining in and have a good day.