Julia Naomi Boeira - Why we need another time serial database?

Automatic Summary

Understanding Time Series Databases

Julia Naomi, a tool engineer, game developer, writer, and creator of VIDB, introduces the concept of Time Series databases and brings forth the question: Why do we need another Time Series database?

What is a Database?

In simplistic terms, a database is categorized as a set of organized data that through some type of relation, brings a sense of meaning to this data. Broadly separated into SQL and NoSQL, databases can also be categorized by their relationship properties such as property relations, time relations, and much more.

In addition to SQL and NoSQL, the Time Series database is one that mostly falls under NoSQL. It's a type of database where data is mainly a register of events over time. This means that they allow for auditing of data and easy travel through the annals of time, with some Time Series databases even offering relational capabilities.

Main Time Series Databases

Julia shares her experience with four main Time Series databases that have served as inspiration for her database - VIDB. These are:

  • Prometheus: Used mostly for event monitoring and alerting as it records metrics in real-time in a Time Series order.
  • InfluxDB: A Time Series database that offers fast and high availability storage, mainly used for monitoring metrics, sensor data, and real-time analytics.
  • Crux DB: A bitemporal database that allows users to track changes over time and compare data. This Key-Value and Graph query database is also one of the more flexible databases for general-purpose usage.
  • Datomic: A transactional Time Series database with comprehensive queries over the data allowing users to travel back in time and view changes made to the data. This database also offers relational capabilities.

Introduction to VIDB

Julia's choice to develop the VIDB, a new database, stemmed from the restrictions of current databases. Datomic and Crux are closely tied to the Clojure environment, which some developers may find limiting, while InfluxDB and Prometheus are primarily metric-oriented databases.

VIDB, on the other hand, was built using Rust, a programming language known for its small memory footprint and safety qualities. It offers a self-contained database with HTTP support amongst other features. With VIDB, it's easy to travel back in time to access historical data and it also supports basic relational algebra.

VIDB Transactions and Queries

Like many databases, VIDB supports a variety of transactions such as 'create', 'update', 'delete' and 'evict'. These transactions allow users to manipulate the stored data, reset data values and delete entire entities from the database if required. VIDB also supports encryption to ensure the security of data.

When it comes to querying, VIDB is flexible and allows a wide range of query types from select all and select individual key-value pairs to select within specific times, differential and intersecting queries, among others.

In conclusion, the VIDB database reveals the value of Time Series databases and how they can be tailored to the specific needs of users. Through the design of VIDB and the inspiration from other databases, Julia demonstrates that it is possible to develop more adaptable, efficient and varied database tools even in the face of challenge.

Final Thoughts

Her parting message is a call to action, "Do it even if people tell you it's too hard — you can do it. At least you should try." This serves as great motivation to every reader to always venture into new territories, explore and keep pushing the boundaries of their capabilities.


Video Transcription

So I guess I will start. Uh Well, my name is uh Julia Naomi and I will talk about why do we need another Time S database? Let me just check if everything is OK. Yeah, it seems fine. OK. So why do we need another time zero database?Uh Well, who, who am I? Uh my pronouns are she and her, I'm a tools engineer at courses, one Hobbs game developer, mostly a writer. So I have three books published. I'm currently working with open source rest and I add tools at KS one, I'm creator and maintainer of Vid B, Transistor genera Beco transit and Rustic, My github and Twitter are no mis. So what is the database? A database databases are a set of organized data that through some kind of relation generate a sense of meaning to this data. So they are generally separated between SCO and OO but that's just the tip of the iceberg. They can be uh property relations, time relation if they, if data needs to be reusable graph and many more. So how data is organized defines which kind of database you are using? So SL no SQL and whatever more whatever more we will see later. But uh SEO is a data is determined by relational uh relational algebra between uh the the data in the database. No SQL usually refers to known relation of algebra. So graph is uh graph databases, document databases, key value databases.

And and there is a special kind of database called no SQL uh times zero database. That is main usually a no SQL database. Uh that that data is mainly um a a AAA RE register in events over time. So this is uh small example of how many kinds of databases we have and some of them have databases that can do both uh two things can do uh many different things. So we have databases, document databases, key value. They are mainly no SQL. We have new SQL databases. We we we we have the regular SQL databases. There's just relational algebra, we have big tables and some of them can have more than one of these uh definitions. So for databases, one thing is very important is that the cap cap capture. So you have to choose between two of these properties. So consistency availability and per and partition. So consistency means that all clients see the same data at the same time by availability. The system continues to operate even if a failure if an if there is presence of failure in the node and partition tolerance is the is, is this is the system can operate in spite of network network problems. So uh but this is not really true because most databases can't really have both, can't, can't really have two of them at the same time, but they get really close to it. So my database is mainly consistent.

So, but I know, but not all databases are right for you, you have to choose them wisely. There are so many pa paradigms and the paradigms need to be adapted to your data. So this is really important. Uh What's the purpose of a time zero database? Then times zero databases have two main purpose is to audit data. As all, all all events from an have been register over time. It's easy to identify in inconsistencies in the data and it and another purpose is to travel in time. So it's really easy to travel. And you can see uh how was your data in in the time range that you are interested? And some of these time zero databases have relational capabilities and they this relational capabilities can be used over time. So this is really interesting. So because you can compare uh data over time. So uh here I bring the four main, the four main times zero databases that I have worked with. Prometheus is a database used mostly for event monitoring and alert as it records metrics in real time in a time zero order. Uh influx to B is a fast and high avail and high availability time series storage it is mainly used for monitoring metrics, sensor data and real time analytics. And the last two are the MMM more interesting databases for, for for general purpose usage.

So correct to B is a bit temporal which means uh you, you have the transaction time and the value time. While the transaction time cannot be changed, the value time can be changed. So you can fix time errors in your database. It's key value Uh And it's a graph queried graph query database that uses data log to support this this graph query. So you can search uh over data with data log and it supports different backends. While atomic is a transactional transactional times zero database with rich queries over data. Lo uh Its most famous usage is Nank in Brazil and it's uh and it has relational capabilities. So these four databases are the main inspiration for my time zero database that's called VD DB.

And why did I develop a new database? This one? How would it be? So the atomic and cracks are very strictly related to closure environment. Although for cracks, there is two clients transistor and trans lix then that support rust and elixir uh atomic depends mainly on dynamo DB.

So it, it's not even a stand alone uh database Rey B depends on the chosen back end. So you can have uh Kafka, you can have dynamo, you can have many databases influx and Promes are mainly metric, metrics oriented and I wanted to prove that RS was a good choice for developing a database. So Russ has a small memory footprint, which means that this database will get a good performance with a small memory footprint and it's really safe. So I have safety over access access, accessing the accessing data, which which helps me to efficiently write and read from files.

Also WD DB is a self contained uh database with, with, with http support. So all data can be carried through through through http. This is the repository. It's currently on beta eight version. Uh It has it's mainly Rust and it has a few, a few contributors already. So this is the book that, that I've written to help people use. Uh Uber B. It's an experimental general purpose Times Zero database. Uh Its name comes from Korean, from our in Korean. And the project inspirations were those four databases that, that I that I talked in the past Spark QL and data for the query language that I called the WQL and a few books uh written here. So how does word debut work? Uh It has a list of features. So it's mainly inspired by my usage of the Atomic No Bank. So the Quired language is inspired by Spark QL and data. It's really similar to Spark QL but has some data syntax in it. Uh It's time serial. Its main purpose is to be time serial. It's schema less. So you can send any kind of data you like and it will, it will adapt to the data. It al it it already has, it has deep, deep keep key value storage so you can have uh maps inside of maps. Uh It, it has HTP support, authentication, authorization and encryption of the data. Uh It does part very large numbers. Uh So beyond you uh integrate 100 and 2028 in rust, it has a conditional update that I will, we will explain later. It has fire compression with Z STD.

It's fast. Uh And it, it has support for FF for querying an entity history. So you can go back in time and, and, and, and query a range of the, of the entity history and it does support uh basic relation of algebra, basic. So having no idea, uh it's important to have an idea of how big your machine to support would be, needs to be. So if you have around 100,000 entries a day, your file size can reach between six and nine gigabytes, they will be compressed every seven days which is configurable to less, but usually seven days is OK. And, and they will be reduced to 12 or 35% of their size depending on how the data uh has been organized by uncertain. One important consideration is second, hip memory consumption for 100,000 entities can go up to 10 gigabytes. Uh Has it used the three map uh the it's login. So if you have 1 million entities will not be 100 gigabytes will be like uh 20% more. So, th this is important for the next slide to some architectural considerations. Why are we using W I DB? So W DB is fast but it supports very la large queries and the mo uh so it has the, it, it has more or less the same performance for large queries and small queries.

So really small queries take around three microseconds and like querying 1,000,090 take around seven microseconds. If no other data is being queried queries, read data. So there is a query, endpoint, transaction, search and update data. So there is a transaction uh transaction, endpoint entity history to query.

The history has its own land point and it's trigger, trigger, trigger by a feature. So you need to say you want to have to support it, encrypted data cannot be queried, so it cannot be read, but it can be checked. I will explain it later. Uh You should keep one database per service as uh it's cheaper this way and and shing the database. It's a good idea when you have a lot of uh a lot of, a lot of customers word debate is not naturally distributed. So this can become a problem. Uh If you need this uh partition where the base internally a sync and based on nectars, uh it has one worker to guarantee consistency and the trade of having trade of of having only one worker is that it, it supports multiple cars to parallelize data processing. So it's faster this way than having 22 cars, two workers. So a field transaction examples, if you want to create an entity, just a great entity. And the name, if you want to have unique entities, you, you use the keyword, unique and a set with the unique entities. If you want to encrypt one entity, you use the keyword encrypt and the set of this, the keywords that you want to encrypt insert is really simple. There are two kind of updates here. So update, set and update content, update, set, resets, uh uh resets the content saved in the entity.

So it, if you have in the database uh uh four A equals four and B equals zero, it will be reset to A equals minus four and B equals 32 and update contents. Uh We'll do some arithmetic over the data. So if A is 30 you say the, the new update A is minus 34 you will have minus four as a result. And here you have the matchup date as a conditional. Uh If all conditions here are matched, it will update as a set. So it will reset the data and there is delete and evict, delete, deletes. The last uh the last uh the last update for the entity while evict deletes the whole entity from the database, you can delete an uh a table or the whole entity. So an example of response, it's usually wrong but it does have Jason support and the the state return uh uh in a simple manner that can be uh that can be carried with the client. So query check is how we uh can check for encrypted data. So we send, check the keyword that we want and the phrase that we want to check to the entity. If the response is true, it will send us through and if it's false, it will send us false. If you try to check a non, a non encrypted data, it will throw an error saying that this data uh cannot be checked because it's not encrypted. So query select, we have uh select all uh that's just like SQL.

We can, we can query a few, a few keys from the entity. We can query a few keys in a specific in specific I DS uh select uh the select with the keyword ID is similar to select where it equals this odruu ad. Uh we can, we, we can query in a specific time. So this take a little longer than just the uh querying a specific times, takes a little longer than query all, all data because it, it needs to run through files and you can query in a, in a time range as we as well as you can as you as you can query before where, so it does support re relational algebra.

So you, you, you, you, you have group I where the, the query response will be grouped by the key that you have sent, that you want to group by, you can uh you can order data. So it will be ordered by key, you can deduct data. Uh So it will uh reduce, it will remove data that, that have the same key, you can limit. So uh if you want to have only 10 entities, you will get 10 entities and you can offset um by, by an integra, you do have count. So you can, you can get how many elements you have, you do have unions, unions, they unite two different entities into uh one response by a query uh which is different than join in here intersect, you will intersect two entities. So you are not inter intersecting the whole uh the whole domain of the table. You're just intersecting two entities as uh the same for difference. You have the difference between two entities, what is different in, in them. And you can join, then you join two queries by a key. So you say you want to join all, all queries by key A and it will, it will join um both, both bo both queries by T A. Uh you do have vanity support. So it's a curl here example of how to send a response and you will will have like uh the entity, the time it it was in it was changed and and the and they stated he uh with the transaction time.

So for all of them, you will have this and a request for entity history is identity key. The ID you want start time and end time are optional and thank you. Uh I won't say that do it even if people tell you it's too hard or you can do it, at least you should try. So that's all.