Time-series data is emerging as the dominant type of data produced in IoT, financial services, manufacturing, DevOps, monitoring, and machine learning. Time-series databases are specialized for storing and analyzing this kind of data efficiently and quickly. This article compares QuestDB with InfluxDB to help understand the key differences so you can choose a time-series database that fits your use case.
The typical uses of time-series databases are for:
- Application performance, uptime, and response times
- Analyzing financial transactions and trades
- Monitoring network logs
- Asset tracking
- E-commerce transaction data, sales insights, BI reports
- Sensor data from IoT devices
While the traditional use of databases was to store the last-known state of a system, this only shows a snapshot of a point in time. In contrast, time-series data adds a historical dimension to help understand how information changes.
Understanding how data changes over time helps you predict your data, gain deeper insights into trends, and be better prepared for future variations. For this reason, developers need a time-series database that's robust, easy to use, and powerful. Importantly, it must be flexible enough to be used across different scenarios, industries, and use cases.
Post edited in May 2023 to reflect the latest updates
Released in 2013, InfluxDB is an open-source time-series database subject to the MIT license. Both open-source versions of InfluxDB (v1 and v2) are implemented in Go.
QuestDB is an open-source time-series database licensed under Apache License 2.0. It is designed for high throughput ingestion and fast SQL queries. It supports schema-agnostic ingestion using the InfluxDB line protocol, PostgreSQL wire protocol, and a REST API for bulk imports and exports.
A brief comparison based on DB-Engines:
|Primary database model||Time Series DBMS||Time Series DBMS|
|Implementation language||Go||Java (low latency, zero-GC), C++|
|SQL||SQL-like query languages (InfluxQL, Flux)||SQL|
|APIs and other access methods||HTTP API (including InfluxDB Line Protocol), JSON over UDP||InfluxDB Line Protocol, Postgres Wire Protocol, HTTP, JDBC|
One of the first places to compare QuestDB and InfluxDB is how data is handled and stored in each database. InfluxDB has a dedicated line protocol message format for ingesting measurements. Each measurement has a timestamp, a set of tags (a tagset), and a set of fields (a fieldset):
In InfluxDB, tagset values are indexed strings, while fieldset values are not indexed. The data types that fields may use are limited to floats, integers, strings, and booleans. The following snippet is an example message in the InfluxDB line protocol:
QuestDB supports the InfluxDB line protocol (ILP) for compatibility purposes, so inserts using the ILP supports the same data types. QuestDB supports additional numeric types, such as BYTE for 8-bit integers, FLOAT for 32-bit floats, LONG256 for larger integers, UUID for identifiers, and GEOHASH for geospatial data. Additional types can be used while ingesting InfluxDB line protocol by creating a table with a desired schema before writing data.
QuestDB also exposes PostgreSQL wire protocol and a REST API for inserts, allowing for more control over the data types that the system can handle, including additional types such as date, char, and binary. In QuestDB, it's also possible to add indexes to existing columns in tables, which can be done directly through SQL:
QuestDB has full support for relational queries, whereas InfluxDB is a NoSQL, non-relational database with a custom data model. QuestDB supports both schema-agnostic ingestions over InfluxDB line protocol and a relational data model. Users can leverage both paradigms and perform SQL JOINs to correlate "schemaless" data with relational data by timestamp.
For storage, InfluxDB uses Time-Structured Merge (TSM) Trees, where data is stored in a columnar format, and the storage engine stores differences (or deltas) between values in a series. InfluxDB uses a time-series index for indexing to keep queries fast as cardinality grows. Still, the efficiency of this index has its limitations, explored in more detail below.
QuestDB uses columnar data structures but indexes data in vector-based append-only column files. Sorting of out-of-order data occurs in a staging area in memory, while sorted and persisted data is merged later using an append model. The main motivation for merging with persisted data in this way is to keep the storage engine performant on both read and write operations.
InfluxDB has a shard group concept as a partitioning strategy, allowing for grouping data by time. Users can provide a shard group duration which defines how large a shard will be and can enable common operations such as retention periods for data (deleting data older than X days, for example):
QuestDB has similar functionality to partition tables by time, and users may specify a partition size. When tables are partitioned by time, table metadata is defined once per table, and column files are partitioned at the filesystem level into directories per partition:
Both QuestDB and InfluxDB have ways to partition data by time and employ a retention strategy. The difference in partitioning for each system is the broader terminology and the fact that QuestDB does not need to create separate TSM files on disk per partition.
The emergence of NoSQL as a popular paradigm has effectively split databases into two categories: SQL and NoSQL.
InfluxDB originally started with a language similar to SQL called InfluxQL, which balanced some aspects of SQL with custom syntax. InfluxDB eventually adopted Flux as a query language to interact with data.
QuestDB embraces SQL as the primary query language so that there is no need for learning custom query languages. SQL is a good choice for time-series databases; it's easy to understand, most developers are familiar with it already, and SQL skills are simple to apply across different systems. As reported by the Stack Overflow developer survey of 2022, it's the third most popular language developers use. It proves to be a long-standing choice for quickly asking questions about the characteristics of your data.
Flux enables you to work with InfluxDB more efficiently, but it's difficult to read, and it's harder to learn and onboard new users. From users' perspective, learning a new custom query language is inconvenient for accessibility regardless of their engineering background.
Consider that the Flux query above can be written in SQL as follows:
Let's compare how QuestDB and InfluxDB operate in terms of performance for both ingestion and querying. As usual, we use the industry standard Time Series Benchmark Suite (TSBS) as the benchmark tool. Unfortunately, TSBS upstream does not support InfluxDB v2. Hence, we use our own TSBS fork.
The hardware we use for the benchmark is the following:
- c6a.12xlarge EC2 instance with 48 vCPU and 96 GB RAM
- 500GB gp3 volume configured for the maximum settings (16,000 IOPS and 1,000 MB/s throughput)
The software side of the benchmark is the following:
- Ubuntu 22.04
- InfluxDB 2.7.1 with the default configuration
- QuestDB 7.1.3 with the default configuration
To benchmark ingestion, we used commands like the following one:
Here, we use a
cpu-only scenario with two days of CPU data for various numbers
of simulated hosts. The number of hosts is set to 100, 1K, 100K, and 10M to
demonstrate the effects of high data cardinality
on ingestion performance. We use ten client connections for the benchmark in
The chart below summarizes the results for ingestion performance:
QuestDB hits maximum ingestion throughput of 1.69M rows/sec versus 524K rows/sec for InfluxDB. For 10M unique devices, QuestDB sustains 663K rows/sec, versus 66K rows/sec for InfluxDB. The latter illustrates the effects of high cardinality on InfluxDB's ingestion performance.
Cardinality typically refers to the number of unique elements in a set's size. In a time-series database context, high cardinality boils down to many indexed columns in a table, each containing many unique values.
High cardinality is a known problem area for InfluxDB, and this is likely because of the system architecture and storage engine. InfluxDB stores each time series in its own Time-Structured Merge tree, a log-structured merge (LSM) tree-like persistent data structure, so reading and writing the data becomes more expensive when the total number of time series becomes high. In QuestDB, the storage model is completely different from LSM trees or B-trees and instead uses data stored in densely ordered vectors on disk. As such, QuestDB deals with high cardinality datasets better.
While QuestDB outperforms InfluxDB for ingestion, query performance is equally essential for time-series data analysis.
As part of the standard TSBS benchmark, we compare both databases for a few types of queries, which are popular for time-series data:
All queries were run for two weeks of 4K emulated host data. Here are some sample commands to generate and run the queries:
Again, we use ten concurrent client connections for the comparison. We measure how much time each query takes in milliseconds:
Two conclusions from the benchmark:
- InfluxDB performs better for queries that access the data for a single or a
few nodes, such as
- QuestDB outperforms InfluxDB significantly for more complicated queries touching the data for many or all nodes.
QuestDB's underperformance for queries on a single or a few nodes can be explained with index-based data access. Since there are many node measurements (think, time series) stored in the table, fetching values that belong to a concrete node assumes a random disk access pattern. Of course, this stands true only for so-called cold query execution, i.e. a situation when the data is not in the OS page cache. Once the data is in the cache, disk I/O doesn't matter anymore. We plan to improve cold execution for such queries significantly with the sub-partitioning feature. This way, multiple symbol column values (think, nodes) will be stored in smaller column files, leading to a sequential access pattern and, hence, better performance for queries accessing only a single or a few symbols.
For more complicated queries, QuestDB's better performance lies in its query engine: it takes advantage of a columnar data layout, SIMD instructions, and multi-threaded processing.
QuestDB is a younger database. However, QuestDB builds on the immensely popular SQL language and is compatible with the Postgres wire protocol. It also borrows from InfluxDB by implementing InfluxDB line protocol over TCP, a convenient option for fast data ingestion. In addition, QuestDB has a vibrant and supportive community: users get to interact with the engineering team building QuestDB.
InfluxDB has an impressive ecosystem and has been around for a long time. As an older product, InfluxDB is the most popular time-series database.
We have compared InfluxDB and QuestDB, assessing performance and user experience. QuestDB ingests data 3-10 times faster, and many benchmarked queries perform better.
We believe in the simplicity of SQL to unlock insights from time-series data. Some of the SQL extensions that QuestDB offers, such as SAMPLE BY, can significantly reduce the complexity of queries and improve developer user experience.
If you are interested in how QuestDB performs compared to other OLAP and time-series databases, check out our benchmark articles.
The best way to determine the best fit is to make it real with your own data. QuestDB Cloud offers $200 in free credit. Or, if you would rather play around with sample data first, checkout our live Web Console.