What's a time series database?#

A time series database (TSDB) is a database designed to efficiently store and process time series data.

Time series data is a set of data points associated with a timestamp, typically collected and recorded in chronological order. Examples include financial market data, sensor readings, and application or infrastructure metrics.

Due to the continuous nature of time series data, traditional relational databases are not optimized to store and query them. On the other hand, time series databases are purpose-built to handle the unique characteristics of time series data, allowing for fast ingestion and analysis.

What makes time series data different?#

Time series data is different from other types of data due to its temporal nature. It is marked by the following characteristics:

  • The order of the data is important. It captures temporal information such as seasonality and cyclicity
  • The volume of data is very large
  • The flow of data is often uninterrupted within a time window
  • The amount of data may vary widely depending on the time interval
  • The relevance of each data point diminishes over time
  • The data is often down-sampled or aggregated over different time intervals

Analyzing time series data involves identifying trends over time to create forecasting models or detect anomalies. An example query might look at a weekly average sensor reading over the last six months or track the maximum price of a stock per hour over a week. Traditional databases often struggle with these kinds of queries.

Example of a chart that plots time series data: Daily energy usage and forecast in Germany, in May 2018.
Example of a chart that plots time series data: Daily energy usage and forecast in Germany, in May 2018.

How do time series databases deal with time series data?#

Time series databases make different design choices to optimize for time series data.

First, to capture a large amount of data and respond to them in near real-time, they focus on ingestion speed over transactional guarantees that SQL databases provide. Data is usually written in an append-only manner instead of updating or deleting individual records. This allows time series databases to ingest data fast and expose the most recent data for analysis such as anomaly detection.

Also, time series databases use streaming protocols like the InfluxDB line protocol (ILP) to more efficiently ingest data. ILP is a high-performance, text-based line protocol that can represent data points compactly and support schema-less ingestion. By utilizing protocols like ILP, these databases can ingest changing data on the fly such as telemetry data from VMs or sensors without needing to specify the schema upfront. It can also accommodate changes to the schema, unlike SQL databases that require migrations.

Once ingested, data is then automatically indexed and partitioned by time for fast retrieval of time-based queries.

Time series databases also use data compression and retention techniques to efficiently store and archive older data.

Finally, to support the analysis of time series data, they also come with built-in functions for downsampling, interpolating, and aggregating time series data. These functions enable efficient temporal analysis to identify trends and insights from the data.

Why are time series databases popular?#

The amount of data being collected is continually growing. And a lot of that data is marked with a timestamp.

In financial markets, we have pricing information on stocks, commodities, and cryptocurrencies being used by trading desks, fintechs, and traditional firms such as banks more than ever.

With the rise of the Internet of Things (IoT), we are now collecting more sensor data for asset tracking, remote monitoring, and personalized health platforms.

Finally, cloud computing has enabled more companies to easily run VMs in the cloud, leading to an explosion of application and operational metrics.

While traditional relational or NoSQL databases can also store and process this information, it does not perform well at scale compared to time series databases. Coupled with the nice built-in functions to analyze this data, it is no secret why time series databases are growing in popularity in more sectors:

  • IoT: asset tracking, electric batteries monitoring, smart grids, supply chain optimization
  • Finance: financial tick data, financial metrics, transaction logs, blockchain data
  • Application metrics: user behavior, e-commerce order tracking, API monitoring
  • Industrial: process control, production line monitoring, predictive maintenance
  • Infrastructure metrics: server metrics, network data, log management

Are there any open source time series databases?#

Yes! There are several open source time series databases.

What are some examples of open source time series databases?#

Popular examples include QuestDB, OpenTSDB, Timescale and InfluxDB. Driven by the financial success of popular open source databases like Elasticsearch and MongoDB, we can expect to see more and more engineering teams build their databases and businesses through the open source model.

Why make an open source time series databases?#

An engineering team may decide to make their database open source so that they can grow a community alongside their tool. Crowdsourcing from a talented community around the world helps keep software quality high and more secure. Public code tends to be written more accessibly, and security issues are in plain sight for all to report and hopefully help address.

Are open source time series databases free?#

Largely yes, but to a point. While many companies do provide their databases as open source, time series or otherwise, the code may be prohibited by a limited license, such as the Elastic License or the Timescale License, instead of a more flexible license like MIT or Apache 2.0. This prevents other companies from providing their open source software as a service to others by placing a limitation on the usage of the code. This means that monetization of the database is often then in the hands of the builders.

Do note that there is a lot of nuance in open source database licenses. As a result, it's also common for open source databases to have a more restricted version of their software which can differ in features from its open source model. This then allows them to provide a fully open source base version of their product along with a paid version. Before choosing an open source time series database, be sure to check out their licenses and product offerings!