Heather Maga What Is NoSQL?


Video Transcription

Uh Welcome to the session entitled. Um What is No SQL? My name is Heather Mega. Um And I'm going to be presenting a very high level overview of no SQL. Um I hope you enjoy it.Um I'm going to hide the chat in just so that I can keep on track. Um I have a tendency as my students will tell you to get off track if I don't focus. So, um uh put your questions in the chat if you have any and I've left a little bit of time at the end of um um the talk today and hopefully if I can answer your questions, I will. OK. Uh So, uh first off, who am I? Uh well, as I said, my name is Heather Mega. I'm a freelance uh software developer and part-time professor or instructor um at a local community college here in uh Ontario, Canada. Uh I started my journey into technology um about 20 years ago. Sometimes I like to call it uh falling in love with computers. I know that sounds a little flowery but uh sometimes that's how it feels because I kind of came late to it. But Um, I like to tell a little story about how I got into computers. Um, I was updating my mother's computer. Um, she was using Windows 3.1 and updating to Windows 95. Yes, I'm that old and I was shoving disk after disk after disk into the computer and where most people would say this is terrible. Get me out of here.

This is boring. I was actually having a lot of fun. So I thought, you know what, maybe I should do this and go to college for this. So as I was moving out to Ontario anyway, I applied to Loyalist College and while I waited to uh get in, I taught myself web development. Um And this of course, was back in the day when we used um text editors and tables to lay out our web pages for any of those web developers in the audience today. Uh So in 1999 I graduated from Loyalist College with a, with an information systems diploma, which is just another way to say, programmer analyst. Um And I actually work now uh at Loyalist again, I've worked there at various times, but I work there right now as a part time professor and I teach database systems one and two. I cote machine connectivity. Um And I also teach customer service and project uh management in our networking courses. Um It's a lot of fun.

Um m my uh students like to uh make fun of me a little bit because I kind of lose track of where I'm going, but uh hopefully I won't lose track too much here. Uh OK. So, oops, sorry, lost track of my stuff here. Ok. So I'm also an empty nester. My daughter uh is also a software developer and she's working in the industry right now. Uh So I couldn't be prouder that my, my, my little girl is following in my footsteps. So, uh so that's enough about me and all that stuff. Let's talk about uh no SQL. So I'm not sure what everybody's level is, but this is just an overview. Um very general information. So we won't be doing any deep dives into anything. Um So I'm gonna talk a little bit about the basics first. So essentially what is no SQL? Uh So no SQL uh very briefly is a database uh that does not use SQL uh to access the data stored within it. And unlike traditional databases which have a schema and use SQL to get the data out. No SQL is schema. Um uh So what is uh schema and what is schema? So a schema is what most relational databases have, which is a predefined structure which is usually fixed. Um It tells us what the tables are, what the columns are, uh how those tables interact with each other, how the columns store data, um They even have information about the keys. Um And the rules that are, that are associated with the data.

So uh whether they can be blank, uh what the structure is and if they're required or not, whereas schemes uh does not necessarily have a predefined structure. And if it does um it can be molded um and has a greater flexibility for change later on um because uh documents can have different attributes. So I talked about the schema systems and the relational databases needing uh rules for things that are required um with no SQL databases um with no SQL databases, um the developers check the requirements. So it's, it's up to the developer in the front end to make sure that the required information is uh saved into the database in the correct format. So this doesn't mean that uh we can't still have relations uh within our data uh relations can be made through embedded data or like with more traditional uh we can have uh referencing. Uh We can also use a combination of the two. So it is the best practice to still use data modeling to plan out your no SQL databases. Um But before we have a closer look at embedding referencing and modeling, I actually wanna talk about the four main flavors or types of no SQL databases. OK?

So like I said, there's four different types. Um There is a key value pair, uh document, uh column and graph uh key value pair is exactly what it sounds like So if you have knowledge of uh databases or programming, you know what key value pairs are, um They're made up of keys and values. Uh They don't have a schema and they're very similar to a dictionary or a hash table. So keys uh just in case uh are identifiers associated with values. Uh They must be unique for the entire name space and a name space is an entire or collection of identifiers. So that name space could be um the entire database. However, it could be a separate name spaces within a single database. Um So values we should know what they are but their values are data stored along with the keys. Um Values can be simple or complex. So for example, they can be a string or a binary um And values can vary in type. So for example, if we have uh record um an attribute that is a student uh picture, a student ID picture, we can actually have um one instance of that have a blob which is a binary object.

Um And then we can have another instance uh for another student that is a string location um on the internet or on the server. Uh So they can vary and type within the same uh attribute. So you can see where um already we have that flexibility that we're gonna talk about later. But we are starting to see that flexibility that uh no SQL databases allow us to have So uh key value uh databases typically do not enforce checks on data type. Um And the reason for that is um what I just said, so values can vary in type. So the database uh management system is not enforcing that data type itself. Um Again, that it's up to the developer of the software or the site um or to there. Um How's my speed? Am I talking too fast, everybody? Ok. Ok, great. Thanks guys. Oh, ok. Uh So carrying on uh we have a couple of examples. So we have uh I don't know how well we can see those, but let's just have a zoom in here at the phone directory. Uh So this is uh key value and we can see that the key in this case um is Bob Smith. Um And the value is the phone number. So that's a very simple example of uh key value. Now, let's have a look at another one and here we just have just a little bit more complex but not so much so that we don't understand what's going on.

Um And these are um uh teacher colon, name, teacher, one colon, school, et cetera. And we see that um we have name, school availability hours for me and name and school only for Tammy teacher. Um So you'll see most of my examples are schools um as I have used some of my uh media that I use for my students in my slides today. So you're gonna see a lot of uh school based examples. Uh OK. So let's talk about document databases, which are my favorite, if you can have a favorite kind of database, which I do. So document databases um use a key value approach but they store the uh values as documents. Um So uh documents don't refer to something that we would identify um initially as a document. So it's not a word processing file uh but rather um data structures are stored as strings or representations of strings. So typically that uh is in the form of uh Jason or XML. So Jason uh javascript object notation and XML uh extensible markup language. Um So I focus mostly on Jason and the examples that we're going to see are uh Jason. But um some people feel more comfortable with XML but they can be either one. So instead of storing each attribute of an entity within a separate key document, databases, store multiple attributes in a single document.

So the example that I have here on the screen is two students, Joe and Sally and we can see that they have documents. Um We have Joe um and he has key value pairs uh within his document. Um Sally also has key value pairs within her document, but we noticed something different here is that we have extra attributes for Sally. Now, that doesn't mean that Joe doesn't have those attributes. It just means that Sally does have them if this were a tra traditional relational database structure, um we would have to see those columns over here and they would typically have default or null values within them or blank. Mm So again, we're starting to see that flexibility where um we can add these attributes as, as we go and we don't have to store them if they don't have them. So I'm gonna talk uh very, very briefly about column and graph databases. Um I don't have uh much experience with them, so I, I can't really explain them as well as I would want to. Uh So I'll just talk about them briefly. Um A column database is a no SQL database that stores data in columns instead of rows. Um They take up typically less space than a relational database. Um And are faster because of the way that it indexes data. So column and graph databases are a lot more complex than our simple key value and document databases.

Um And really, you could do a whole uh session or a whole presentation or a whole lecture or, you know, even a whole course on these types of loans. So that's why I'm just gonna go lightly into these um a graph database. It uses structures called nodes and relationships. And if you're familiar with object oriented programming, you know that a node uh or you know that an object is a class. And so a node is an object that has an identifier and a set of attributes that very similar to a class. Um And a relationship is a link between the nodes that contains also an attribute about that relation. So um early I talked about um the structure of no SQL databases or rather no SQL database having a predefined structure. But the reality is that they do have a start structure. It's just that it's a lot more flexible. We know that when we build a relational database, um we're taking a lot more time in the beginning or we should be taking a lot more time in the beginning to um plan out our database and really understand the relationships um And doing, you know, normalization and making sure that our database is as efficient as we can make it.

Um No SQL databases usually do start with a strep. Um And it is important but we can change those models over time. So where we start is not necessarily where we end with our documentation or our, our modeling of our no SQL databases. So when we're modeling our no SQL databases, we first decide on what objects we wanna keep track of just like we would in a relational database. And we also decide how they're gonna interact with each other. Uh One of the important things that we think about when we're talking about um no SQL databases. Um You mean more specifically document databases is how we're going to be um interacting with that data. So in what context are things being accessed? Are they going on a report?

Are they going on some kind of dashboard? What items are going on a dashboard? And that's so that we know when we're building out our, on our objects, we know what things we can embed and what things we wanna reference instead. So um embedding versus referencing. So um I have examples that explain this a lot better on the next slides. But um embedding is basically the objects or documents are included within the main parent object. So if you have a student document, you're embedding all of their class information within that document.

And we'll, and we'll see an example of that. Um They typically require less reads to the database. Um So we have a better read performance and that's because if we are loading a dashboard, um if we're loading a dashboard with uh student information and uh that student is seeing their information and then they also want to see their class information, we're reading one document instead of one document and then all the class um for that student.

Um So that's less reads. Um Referencing is more similar to um a relational database schema in that objects are linked from the main um object uh by keys. Uh It has better right performance for obvious reasons. So uh this is an example of a document uh the student, the the example that I just gave um where we have class information embedded with a student and I'm just gonna zoom in here a little bit. Uh ok, so we have a student here, um Philip Fry um with his uh student address and his student ID. So that's his student information. Now, obviously, this is really trimmed down because um just to save space and so it all fits nicely on the slide. Obviously, we would have a lot more student information and we would have a lot more class information, but we can see here um that we're actually including the course id, course name, course room um within um this student's document. So again, if we were to read this into a dashboard, we would read this document and we could display all of this information because we already have it all uh read zoom out here. So if we were to reference this, instead drag this over here a little bit, um You can see that we have uh now these are included in the same uh file just for viewing. Um They could be in separate files obviously.

So the student document, um we see it has a student information, but you'll notice now instead of the complete course information, it's more of a traditional relational database type structure where we are referencing um an ID of a course. And then we have the course documents below that have the expanded information. So now if we're using the same example of a dashboard, if that student goes and uh loads their dashboard, we would have to read the student document and all of the course documents that are related to that student. So we have some important questions to ask ourselves when we're deciding if we're going to uh embed or reference. And one of the main examples that, that I gave was the dashboard and how we're accessing that information. So that's one of the questions that we ask ourselves. Uh The next is the relationship between um the entities and their attributes. So we can ask ourselves questions. Uh So when to reference, so one to many relationships and if you have experience with databases, um you know that a one to, you know what a one to many relationship is, but an example would be a blog uh post um that has many comments um or likes.

Um Another time we would reference instead of MB would be entities are frequently queried or updated or not frequently queried or updated together. Um So again, we can go back to our blog post um And our comments uh many to many relationships. So for example, students in courses or courses and rooms, um the important one is entities change with different frequencies and we can again use our blog and likes and comments um as an example of entities that change with different frequency. So comments are related to blog posts, but they would uh those documents would change more frequently than the blog post itself. Um I mean, typically when you write a blog post, it it stays there, you might edit it later, but not with the same frequency that comments are being added. Um One entity is used by many other entities and the example that I usually give here is classrooms um because um that entity um has a lot of attributes associated with it that other um entities are accessing at the same time. So when to embed, then um we would embed uh with entities that are typically queried or used together um or with uh weak entities.

So if um for example, we have an invoice or an order that has line items, those line items are weak entities that only exist if that um invoice exists and they are only ever queried together. We would never really query a line item by itself, we would use those line items with an invoice. So in our previous example, with the student, if we switch that to orders, we would have the order information and then we would embed the line items uh for each item ordered um right, within that document and it would make loading the order or the invoice or the, the, the purchase order or whatever um a lot quicker, another reason to, and that would be a 1 to 1 relationship or um a one to few relationship.

So this could be something like um uh a phone number, uh like a student and a phone number or a user profile, social media accounts. So, um of course, many of us know this uh anybody that's online or manages multiple accounts or, or whatever we, we do have more, more and more social media accounts right now. But uh typically a student or a user account on a website, they would link 11 Facebook account, one Twitter account, one Instagram account. Um and they would generally change with the same frequency as well. So uh instead of uh social media accounts or phone numbers say, um we could say emergency contact information. So if you have a staff document where you have all the staff information, typically when somebody adds an emergency contact information that's not changed for a long time.

Uh So, and it might be updated as the main um staff information is updated. So if they, the, the the staff member got divorced, they also changed their emergency contact information, for example, just keeping an eye on the time here. So what would motivate us um to look into no SQL or what would be a catalyst for uh a change or for that, that that process of wanting to change or wanting to implement some kind of no SQL. Um it would be the, the typical things that happen with databases um over time, um the volume of data increases over time and we know that it's 2020 um data is still king, right? So we know that um data is everything nowadays. Um So we have more and more data that's being uh saved and accessed. Uh So we have volumes and volumes and volumes of data um which increases exponentially in some cases. Um All that data means slower response times, uh especially even if there's not a large increase in the database size or the amount of data, maybe we have more users, maybe we have more people accessing the systems. Uh So we need a better response times and all the volumes of data, all the volumes of users could also mean that the availability um could be in jeopardy.

Uh Essentially, if we're talking um about um old traditional proprietary software um based on one database server, um If we are having issues with access, um if that server goes down, it goes down for everybody. Um and scaling those relational databases means time, it means money and we may have invested the time and the money, but the technology may not be up to what we need. So we may need something to go so fast that we need a certain type of processor, a certain amount of ra m or a certain bandwidth for our network, but we don't have that available to us with the current technology. Um So um we have some main considerations when we're talking about those large scale databases where it's large scale based on data, large scale based on access. Um we have to consider things like scalability, cost, flexibility and availability. So um scalability is important.

Um And the faster we can scale something the better. So um developers and administrators could choose to scale out or scale up. Um And if you've ever been in that situation before, uh you know that you would rather scale out. So add more servers um and have distributed database systems instead of a single server system. Um No SQL um can easily adjust to that kind of scaling uh because it has that kind of built in technology. Um scaling up of course, is upgrading the existing database server, you know, processors, memory bandwidth, et cetera, which I talked about before. And we know that if we scale up, if we're talking about one server, we're talking about taking that server down and not having access to that if we're scaling up. So that increase in scalability is something that um is key for thinking about using no SQL databases. Uh cost. Uh is another one. If we have any analysts that are thinking about money or owners, business owners, uh even developers that are in charge of deciding on technology to use. Um a lot of times um we have a concern with licensing. We all know that um the cost of licensing um some of the relational databases can be uh quite, quite large. So um it might be based on things like the size of the server.

So how much data um the number of users, the number of authorized users or administrators? Um But with many major no SQL databases, they are open source. So that initial access, that initial foothold that we can make that initial uh four A four A that we can make into that no SQL space is um a little bit easier cause we don't have to invest a lot of uh money for licensing. That doesn't mean that there's no support uh because most no SQL database server management systems have um um uh support packages. Sorry, I lost my train of thought uh support packages or contracts. So that um if you do get lost in that forest, you can still um have that little, that little lifeline to help you out um through um paid support if you need it. So flexibility and I talked about this a lot and flexibility is, is very important. So uh no SQL databases do not require a fixed database structure or schema. So I talked about that already. So we know that in a relational database, you typically need to know all the data, you will need to keep track of when you start developing that database. So there's a lot more prep planning. We should be pre prep planning anyway.

But there's a lot of that prep planning and if we have to add attributes or columns later, if anybody has ever done that before, uh we know that what a giant um problem that is to simply add a column, um or the, the dirty data that can happen from adding a column and trying to access old and new records um and not getting the, the, the proper results and reports and it's something that I saw a lot, uh, you know, within the school and in other companies that I worked for, there's always that older data, um that kind of gets lost because of something that we've had to uh update.

Whereas if we're using no SQL attributes are easily added. Oh boy, I am running behind. OK. So availability, uh no SQL databases can be more available as the situation arises where a server goes down, another server in the cluster can take over without any downtime for website or software systems. And I'm, I, I had a whole section about distributed uh databases, but it's a good thing I left it out because we're getting pretty close to 35 minutes here. So, um yeah, so we all know that if we have a distributed database and even uh a simple cluster of computers, say five or 10 computers, we know if one of those goes down, we still have access to the rest of the cluster. Um uh So that makes it more available than our single server system at the very least. So we have a couple of typical implementations and then I'm gonna have to wrap it up here. Um But you might be looking at no SQL for a new software project or a new development where you're projecting a rapid growth, um whether it's users or data. Um So it might be something that you're thinking of and it's a good place to start.

Um Or um the other thing that is popular is when you're modernizing your relational databases and you're creating API S, you're creating all these new tools for your users. What you can actually do is create um uh no SQL database that's either static or dynamic that your API S access instead of accessing that relational database. Uh So that makes the availability availability, sorry and the speed um better than having to that relation access that relational database because we can access that static file. Um The consistency might not be there because the data is not updated um in an instant it's updated on a schedule. Um But it's still a good way to modernize your old proprietary software, your old relational databases, SQL and creating some API S OK? Oh That's all that I have. Uh I actually, I did have actually have a lot more. So I'm glad I kind of knew where to cut it there. But um if there's questions, definitely leave them in there, we only have a couple of minutes. But um if you have some questions, definitely leave them in there. If I can't answer them or I don't get to them. Definitely c connect with me on uh linkedin. Um Or you can visit my um my personal website or obviously the women chat Slack Chat.

Um So let me just see if I can open this up here. Uh OK. Uh There's OK, Victoria, what workspace do we need to create? No SQL DB in. Uh So what, what software is there? Um You can create a no SQL database on any kind of traditional environment. So, um in my class, we use couch DB. There's also Mongo um DB. Um I believe that Microsoft and Amazon both have uh no SQL offerings. Um So you can kind of have a workspace that's local on your server in the cloud. Um It's rapidly expanding as far as I know and there's something else here. I don't know how to work this panel here, but um OK, so I think we have to wrap it up unless there's other questions. I see there's still a lot of people viewing. So um was there another question? No? OK. Uh Well, thank you everybody for watching. Um Hopefully I didn't go too fast for everybody. I know it was kind of basic. Oh, I wish I could answer that one, but there's not, there's not enough time to answer the data integrity with no SQL but very briefly, the data integrity um for data going in is the responsibility of the actual software.

So the programmer or the developer that's putting that data in the database, the responsibility lies there. Um And then if you're talking about distributed, that's a whole other conversation. Sorry, I couldn't give that more attention but definitely connect with me later. Ok. That's it.

Thank you very much, everyone.