Unpacking time-series data for developers
Time-series data is everywhere. It's the fastest growing new data type. But where is it all coming from? And isn't all data essentially series data? When does data ever exist outside of time? This article investigates the source of all this new time bound data, and explains why more and more of it will keep coming.
What is time-series data?
What is time-series data? Isn't all data time-series data? Does data ever exist outside of time?
The quick and clean answer is no, not all data is time-series data. There are many data types and most do not relate to time. To demonstrate, we'll pick a common non-time-series data type: relational data.
Relational data is organized into tables and linked to other data through common attributes. An example would be a database of dogs, their breed, and whether or not they are well-behaved. This data is relational, categorical and cross-sectional. It's a snapshot of a group of entities with no relationship to "when". In this case, these entities are dogs. Adorable! But not time-series data.
In this example, time is not relevant to the name, breed and behavioural tendencies of our dogs. It does not matter when the dog was added to the table, or when any of its values changed. The entities are held in a timeless vacuum.
By contrast, time-series data is indexed in accordance with time, which is linear and synchronized. It consists of a sequence of data points, each associated with a timestamp. An example would be if our database of dogs included the dog's name, their breed, and their time in the local dog race:
This table contains a timestamp and so now contains time-series data. But this data is not the time-series data that requires specific features or a specialized time-series database. To require a specialized database, data must also match a specific set of demand and usage patterns. For now, it is simply time-series data in a transactional, relational database.
In each action, a wake of time
When does it cross into that threshold? For this example, we will put our table of dogs into a practical light. Consider that a group of people input dog information into a database:
Simple! Now take one more practical step. How is the database accessed? A small team of people each login to a front-end data-entry application. For security purposes, an authentication server sits before the web application. A person authenticates, and then their session is kept alive for 24 hours:
The authentication server needs to know exactly when a person logged in to determine when to invalidate their session. This requires a timestamp column. The security provider that handles login may receive tens of thousands of requests every second. Tracking each attempt in chronological order and revoking sessions with precision is an intense demand.
This is a key point. As above, the presence of time-series data didn't mean much to our small-scale transactional, relational database. But now we've got a flow of time-series data. And with it our requirements change.
And so deeper we go...
The database is hosted somewhere, perhaps in the Cloud. Cloud billing is based on compute time. The matching front-end application collects performance metrics. This is all novel time-series data which contains essential insights from which business logic is written.
We can go deeper still and consider the entire chain. DNS queries hit DNS resolvers and hold a Time-To-Live value for DNS propagation. A Content Delivery Network (CDN) before the front-end application gives precise detail on when something was accessed and how long static assets will need to live within the cache. Just one transactional update to the relational database generates a wake of essential time-bound data, for security and analysis.
Thus we retreat from our dive and head into the next section with the following crystallized takeaway: Not all data is time-series data, but time-series data is generated via virtually any operation, including the creation, curation, and analysis of “non-time-series data” in “non-time-series databases”.
Time-series databases for time-series data
So how are all these time-series data points stored?
Time-series databases are specialized databases optimized for storing and querying time-indexed data efficiently. Unlike traditional relational databases, which are designed for general-purpose data storage, time-series databases are built to handle the unique challenges posed by time-series data, such as high write and query loads, time-based data retention, and specialized (probably time-based) analytical functions.
Examples of time-series databases include QuestDB, InfluxDB and TimescaleDB. These databases are widely used in industries like finance, IoT, monitoring, and analytics, where handling and analyzing large volumes of time-series data is common place.
QuestDB is a specialized time-series database. Consider a traditional relational database like PostgreSQL. In it we may store stock exchange data, such as the current price of a stock for tens of thousands of companies. With the sheer number of stocks and transactions between them, a time-series database is needed beside it to record and analyze its history.
For example, whenever a stock price changes, an application will update a queue
which then updates a PostgreSQL table. The UPDATE/INSERT
event is then sent to
something like Apache Kafka, which
reads the events, and then inserts them into a
time-series database.
The relational store, which provides the transactional guarantee, maintains the
stock prices and the present "state". The
time-series database then keeps the history
of changes, visualizing trends via a dashboard with computed averages and a
chart showing the volume of changes. The relational database may process and
hold 100,000
rows at any time, while the
time-series database may process and hold
100,000 * seconds
rows for present and future analysis.
Conclusion
Not all data is time-series data. But creating and accessing any online data generates time-series data in waves. To respond to this demand, time-series databases like QuestDB and others have arrived to handle the wake of data left in the exchange of high-integrity transactions and high-volume operations.
While time-series data requires a specialized performance and feature profile, time-series databases excel with these demands. And choosing the right one depends on your use-case.
If you're interested in time-series data analysis and its methodology, checkout our time-series data analysis glossary.