Life Cycle of a Query in SQL Server

Automatic Summary

A Journey through SQL Server Query Lifecycle

Hello, tech enthusiasts! Thank you so much for attending our session on "The Lifecycle of a Query in SQL Server" at the Women in Tech conference. This is your host Zi Go Gui, a seasoned Database Administrator with a Microsoft Certified Solution Expert badge under my belt.

Today, we are set to embark on an intriguing journey uncovering the mysteries behind the scenes of SQL Server. We’ll slice open the life cycle of a SQL query, exploring each stage inside the SQL engine. So let's get the ball rolling!

About Your Host: Zi Go Gui

Before we dive deep, here's a little bit about me: I've been managing SQL servers and databases for a rewarding seven years, am a Microsoft Certified Trainer, and hold a master's degree in Computer Technology from Eastern Illinois University. When I'm not buried in SQL code, you can find me blogging at DB N nuggets.com, an active platform where we discuss everything SQL Server. Additionally, I'm also a committed volunteer with Microsoft Data Women in Technology Virtual Group and a co-organizer for several user groups.

Breaking down the SQL Server Query Lifecycle

Parsing, Binding, Simplifying

The lifecycle of a SQL query can be neatly categorized into three distinct steps. In the first phase, the query interacts with elements like the parser, binder, simplification, and trivial plan. These components are lightweight and require a minimum amount of resources. They receive the query text and begin processing.

Loading Metadata, Peforming Heuristics, Identifying search phases

Step two is more complex, as components such as load metadata, join heuristics, and search phases require substantial metadata to process the query. Hence, index statistics, histograms, and cardinality estimates are gathered and processed.

Accessing Methods, Managing Transactions, Buffer handling

The final phase introduces the query to the storage subsystem via access methods, the transaction manager, and the buffer manager. Be it read or write transactions, differing actions occur in this stage, such as locking resources, checking for dirty pages, and hardening data to disk.

Read and Write Transactions in SQL Server

Both READ and WRITE transactions follow a specific pattern in the SQL Server. While reading involves checking for shared locks on resources, writing involves exclusive locks and modifications in the transaction log and buffer pool.

Conclusion

Understanding the lifecycle of a query in SQL Server aids in swiftly identifying performance issues and managing the database engine efficiently. It builds your confidence and gives you command over your technical arguments with co-workers or managers. The more you probe, the better you can comprehend and execute.

Resources for further reading

  1. Microsoft SQL Server Documentation
  2. DB N nuggets Blog

If you have any questions, feel free to reach out on my Twitter handle at DB NN gets or via my email, mentioned in the blog post. Until our next session, keep exploring, keep learning!


Video Transcription

Hello, everyone. Thank you so much for taking your valid time for attending women women in tech conference today. It's an, it's an honor, Anna. Thank you so much for giving me this wonderful opportunity. Uh So what are we going to discuss in this session today?We are going to talk about the life cycle of a query in SQL server. My name is Zi Go Gui and uh a little bit about me. I'm a database administrator having seven years of experience in managing SQL server and databases. I'm a Microsoft certified trainer and Microsoft certified solution expert. I did my master's in computer technology at Eastern Illinois University here in Chicago and I do blog for DB N nuggets.com. It's a very active website. Uh We talk all about SQL server and I'm on Twitter as well, very active on Twitter. Um Like if you have any questions like later down the road, please do not hesitate to contact me uh by using Twitter at DB NN gets is my user ID and down below, that's my email address and I have not mentioned everything here. But um I'm a co organizer for a couple of user groups. Um I'm a co organizer for Microsoft Data and A I South Florida user group. And also I'm a co organizer for uh Microsoft Data, equity, diversity and diversity equity and inclusion user group.

And also I'm a volunteer for Microsoft Data Women in Technology Virtual Group as well. And I'm a, I'm one of the um board member uh committee member for the F summit 2021. It's a professional association of SQL server. We have that as a conference. So I'm one of the committee member in that as well. So what are we going to discuss in this session today? We are going to talk about the life cycle of a query. And uh we are also going to see what are the components we have inside the SQL engine. So why should we care to know about what happens behind the scenes? Right. Um Many developers and DB A S they work with SQL server on daily basis, but they might not exactly understand what happens behind the scenes when we submit any query to the SQL server. So knowing this information will really help you in understanding how queries are processed at each stage inside the SQL engine. It will also help you in tracking down the performance issues be efficiently and it also build your confidence when managing database engine as well as the databases.

It will also help you in convincing your coworkers or managers of why they have to take your ideas seriously. Because by the end of this session, you will know what goes on behind the scenes so that you can, you know, you can just let them know like, hey, that's not right. This is right. And let's go ahead and take this decision and they have to listen to you because that's right. Right. And then last, but not least we work with SQL server every day. Most of them like developers who work with SQL server, the more you know the better you can understand the engine and databases. So let's dive deep into the life cycle of a query. I know this uh slide, you know it, there are many components in it. You might get confused but let me make it simple as possible. So if the light cycle of a query is divided into three different steps, then the first step contains the components parser binder simplification and the trivial plan. These components from the first step are very simple.

They are very lightweight to the SQL server, sequel server can process them very quickly with less amount of resources. All they need is like mm you know the query text and that's all it needs but not mu much. But when you come to the second step, these components load metadata join heuristics and search phases. These components require most of the beta data to process any query. And when you come to the third step, the third step contains the components, access methods, transaction manager and buffer manager.

Now these three components, they are the drive locations, right? These three components have to interact with this subsystem, which is like most like most of the storages are like sometimes they are slow. So this is like most um you know uh resource consuming. I can say that let's talk about a select query first and then let's talk about the insert update query later. So let's say there is a user here. You wanted to do SLX statement onto the sequence of who wanted to run no matter if he's using the sequence of a management studio, Azure Data, studio.net, Visual Studio or even a third party application like booking a ticket, right? Booking a flight ticket, he wanted to book a ticket. But if he wanted to run that query onto the SQL server, he should get a secure network connection. But how does that happen? Right? There is something known as SN I which is known as SQL server network interface, meaning it will establish a secure network connection between the SQL operating system and the user. The user is our client here, right? So how does SN I will establish the connection by using the internet protocols like C TCP/IP named by shared memory. Those are the uh internet protocols, no matter what kind of internet protocol we use, the process remains the same.

So once the secure network connection has been established, this particular sn what it will do is it will take the query that we are submitting to the SQL server and it will wrap that query into a tabler data stream packet, which is also known as a TDS packet. And that teds packet will be sent to the SQL operating system once that query is in the SQL operating system. The first thing that happens is the SQL operating system will identify that coming connection as A S and it will assign a speed to it. That means the SQL server process ID, it will give a ID to the particular query and that particular query will be sent to the 1st 1st step of the SQL engine which is the parcel, the parser, the name itself says it will parse a query, meaning it will go ahead and tactical errors, right?

It will go ahead and check if there are any syntactical errors. Like if for example, I wanted to run select star from a table. But by mistake, I have given select star form, then it at the parcel component would first identify that as a syntactical error and throw back me an error saying it is syntactically incorrect. And variables are declared here. Like if you are declaring any variables, right, declare a variable, those variables are identified here, whether they are valid or not. And if you are using any views inside your SQL query, the query text behind that view is brought in line into the code you are sending basically the query, the view query will get expanded into the query that we are sending those things will happen. And um the query that we are sending um it will be passed and then a hash is put on the query. A kind of hash number is put on that particular query and it will go ahead and check if there is already an existing plan for that particular query inside the plan cache.

If there is the same match with the hash value, it will go ahead and use the same execution plan for the query. But what if this particular query is running for the first time? Right, then it will go ahead and build the initial logical tree for the first time. And that particular query will be sent to the next component, which is binder. Binder is completely different for the pa from the parser binder will actually go ahead and check if the object inside of a query they are valid or not. Like if you are having tables columns, whether they are valid or not. And binder is also known as algebras or normalize it. And user permissions are also checked like whether the user that is running the query, whether you have enough permissions or not, those kind of things are checked here. And data type resolution happens here. Like if you are using union all clause in between two tables, if you are running any query and running union all class, the ordinal columns of those two queries should be of the same data type. If it doesn't match in the data type, you will see the error and aggregate bing also happens here like some average those kind of things are bound here.

And with this information, the parser will update the par three again inside the plan cache and the query will be sent to the simplification component. Unification itself says it will simplify the query. Now, how how do it simplify the query by converting if you have any sub queries in your query, that sub query will be converted into a very simple joint. So no matter how many sub queries you have in your query behind the scenes, those will just convert it into outer joint, inner joint. You know, depending upon the type of uh sub queries you have and contradiction detection happens here. Like if you have any foreign key relationship issues, simplification component will identify that here. And with this information again, the past tree inside the plan cache is updated and the query will be now sent to the TV plan component. If your query is so simple, like select star from a table or select start from a table with a simple, we cross, those kind of queries are treated as very simple queries by the SQL server. It will think hm this is such an easy query. Why should I even go ahead and update the past three that these previous component has been built inside the plan cache. Why should I even do that?

Let me go ahead and discard that past tree inside the plan cache and let me create a trivial plan on the fly just like that because it doesn't, it doesn't cost it much, it doesn't cost Sequels of much. It can just go ahead and create the trivial plan like no matter how many times you send those type of very simple queries like 1000 times. For example, trivial plan component will build the and for those queries, those many times those are the trivial plans. And by default, these trivial plans are not stored inside the cache. Now why they do not get stored? Because it's so easy for SQL server to go ahead and create them instead of storing the valuable um you know, the valuable cache inside the memory, it doesn't want to waste that memory memory is limited inside SQL server. So now, but what if the the query that we are running is not so simple? It's so complicated. We have like 10 tables, complicated logic in it. Then SQL server should go ahead and you know, build the, build the full plan for that. So in that case, it will go to the load metadata component. The second step most of the metadata to process any query are gathered here, like for example, index statistics and histograms, the next coming session that I'm going to present will be especially on the statistics.

So go look at that as well. So index statistics, these are basically road map like how how many, how many columns, the sequel of a thing uh coming out of a table, it will just estimate based upon those estimate, it will go ahead and build the uh plan, you know, query plan for that particular query. So those kind of histograms and then index statistics are gathered for that particular query here. And auto create statistics happens here. Like if your auto create statistics is turned on on your database, which is by default, it's turned on. In that particular scenario, it will go ahead and create the statistics on necessary columns. Automatically cardinality estimates are gathered here. Basically, these estimates are nothing but the statistics information those are gathered here and SQL server and windows collect are checked here.

Set options are also checked like set XML on set XML, off set and C padding on or off depending upon on or off your plan might be dependent like what kind of plan gets generated might be dependent with this information. The past tree inside the plan cache is once again updated and the next component is join heuristics. So join heuristic, what it does is like it will intelligently think like let me go ahead and reorder the join in such a way that like for example, let's say select star between three tables, the table between A to B, let's say it is filtering out around 60% of the records from the result set.

And uh um you know, and the connection between B and C table is actually filtering out 90% of the records from the result set. In that case, the joint here stick component, it will go ahead and reorder the joint in such a way that it will push back the joint between B to C front of the statement and the joint between A to B back of it. Now, why do we do that is because it wanted to grab the results as quickly as possible. This kind of information is again updated inside the plan cache and the query will be sent to the search phases. The traces are nothing but these are, you know, the query plans that get generated a different kind of phases and sequel. So I have to go ahead and choose the good plan from these three phases and update that information inside the plan cache and you know, send the query to the storage engine, which is the third step. So now our query is at the storage engine component. So let's talk about the ret transaction like any query, whether it is a select insert, no matter what query it is, the difference happens at the store. So um inside the transaction manager, we have two components in it, the log manager and the log manager.

So since we are doing the transaction. Uh We do not write, we do not have to write anything to the log because we are not writing anything right? We are just reading, we are doing select start from. So in that case, only the lock manager will get involved. You know, just lock manager will get in work and it will request the buffer manager to put a shared lock on the resources that it needs to process the query. Once the query goes to the buffer manager component, all it does is it will go ahead and lock the resources. It will put a shared lock on the resources, it needs to process the query and then it will go ahead and check whether all the uh you know all the resources that it needs to do the this particular select whether these resources are already inside the buffer pool. If it finds those pages inside the buffer pool, well and good, it will go ahead and do the select from the memory of a pool itself. But if it didn't find those data pages, it will have to go ahead and grab those data pages by interacting the desk subsystem by interacting the data files and grab those data pages that it requires to process the query. Do the select, send the results to the user.

That's the re transaction. Very simple. But what happens with the right transaction? Right? It will go ahead and check access methods the same. It will just analyze the query all of how we have to get the results from the disk subsystem. But all the difference happens here with the right transaction. Previously, we just put the shared lock on the resources. But now as we are writing into the uh database, we have to first make those modifications that we are trying to make inside the transaction log manager, like inside the transaction log the previous and the next values are written here. Um And then an exclusive lock have to be placed on the resources. It will go ahead and request the buffer manager to put an exclusive lock. It will put an exclusive lock on the resources it needs. Go ahead and check if all the pages are already inside the buffer pool. If we didn't find some of them or any of them like previously for the ret transaction, you have to go ahead and grab those pages which it requires to process the query, put them in the memory, do the modification to the like to the right right statements into the those data pages and then send the success message to the user.

But did you observe something here? The pages are already written into the buffer pool but they are not yet hardened to the database file system. That's when the checkpoint process will kick in this particular checkpoint process. All it does is it will go ahead and check if there are any dirty pages inside the buffer pool. What are these dirty pages? Dirty pages are the data pages which are already written inside the memory but not yet hardened to the desk. So checkpoint process is an automated process done by the SQL server. Every minute it will go ahead and push back these dirty pages which we have just written and go ahead and harden them onto the database file system. That's um that's how the right transaction works. In the summary we have discussed mainly about the um life cycle of the I did not update this particular summary slide, but then as it is a 20 minute session, we couldn't cover all of this. But what we have seen is the life cycle of a query. What will happen for the read as well as the right transaction um inside the SQL engine. So I followed these particular resources to, you know, prepare for my presentation and thank you so much for attending my session today.