The Time-Series Benchmark Suite, or TSBS, is a well-known benchmarking tool for anyone who's into time-series databases. While not ideal, just like any other benchmark, it does a decent job representing common workloads for a time-series database. Timescale created TSBS based on another benchmark tool from the InfluxDB team. Here at QuestDB, we've been using TSBS to measure and optimize ingestion and query performance for a few years already. Today we'd like to introduce our own TSBS fork, share the story of the optimization we made, and explain why we ended with a fork instead of contributing to the upstream repo (spoiler: we did that too).
We've been using TSBS to measure and compare ingestion throughput for a few years. Considering that QuestDB implements InfluxDB Line Protocol (ILP) over TCP, it's not a secret that our initial implementation of the QuestDB TSBS module was based on the InfluxDB one. We had a great start. However, we eventually discovered that the optimizations in the database had little to no effect on the TSBS results.
The most obvious explanation would be that all optimizations we were adding were worthless, but what if TSBS was the bottleneck?
Luckily that was simple to test.
A meaningful approach to optimize any client-side library, such as a database
driver or a benchmark tool, is to swap the server with a no-op implementation
and then profile the library. In our case, swapping QuestDB's ILP server is as
simple as running
socat with the following command:
This command runs
socat TCP proxy server that consumes the data sent over the
socket and sends it to
/dev/null. The only thing this no-op server does is the
network I/O, so it should make all client-side bottlenecks much more obvious.
socat server, we got the following result with four worker
1.47M rows/s isn't awful, but we can probably do better. The next step was to profile the loader and check any bottlenecks. But how does one profile TSBS?
TSBS is written in Golang, which has a great
built-in profiler called
pprof. While it's
capable of collecting different kinds of profiles, it usually makes sense to
start with the CPU profile. After collecting the profile, we found the following
CPU profile graph:
NextItem calls were doing buffered disk I/O, which is inevitable
when running a TSBS benchmark. First, TSBS generates an input file with ILP
messages, and then a separate loader program to send the data to the target
Two remaining significant call sites from the graph were the
from the QuestDB loader and the
reflect.Select function from the standard
reflect.Select function is essential for the so-called flow
control code used in the TSBS loader. It's not easy to swap it with something
faster. Therefore, we decided to check the
Append code first.
Here is what the function looked like:
If you're not familiar with Go, no worries. The above code writes the read line
to the downstream buffer. It also validates the line for a valid ILP message,
i.e. a string of the "csv-tags csv-fields timestamp" format. This check is
expensive as it involves byte slice-to-string conversion in addition to a couple
While they don't sound scary, these steps were done on the hot path for each ILP message. Per-row validation is a nice-to-have, but that's something we can live without, when optimizing a benchmark tool. So, we ended with relaxed validation, done for a single row per batch of rows (10K rows by default).
With this version of the
Append function we got the following:
OK, we jumped from 1.47M to 2.86M rows/s - quite an improvement. We don't want to bother you with the updated code as it's less compact (and more ugly) than the original one, but if you'd like to read it, here is the pull request to check.
This level of TSBS performance was sufficient for our needs until the recent introduction of WAL tables in QuestDB 7.0.1. WAL stands for Write-Ahead-Log, a persistent data structure holding the complete history of table changes. WAL is the foundation of our upcoming replication mechanism. Additionally, its concurrent writer design improved ingestion throughput significantly.
Long story short, 2.86M rows/s wasn't enough anymore and we needed one more
round of optimization. Remember that
reflect.Select call site?
Yes, we got rid of it.
This time it was enough to disable the TSBS loader's flow control. This allowed us to use a simpler loader implementation for the QuestDB module. You can check the changes here.
That's what we got with the no-op server:
Yes, that's one more jump, from 2.86M to 4.3M rows/s. Again, we're happy with the TSBS performance, at least for the nearest future.
Our TSBS optimization story is over, but we haven't explained why we ended with our own TSBS fork. It's time to fix that.
As you may have noticed, the pull request we mentioned before is still pending. Same story with a bugfix around query generation we created recently. Of course, we're not alone - at the time of writing, there are 25 pending PRs open against the TSBS repo.
Since it's an open-source project, it's totally fine. Since it's an open-source project with a permissive license, we can easily fix the problem by forking it. That's precisely what we did.
So far, our fork includes the above-mentioned optimizations and the bug fix. We also merged InfluxDB v2 support patch since we wanted to be able to run the benchmarks for this database.
We're also open to any contributions and promise to maintain our TSBS fork actively. If you have a patch you want to see in TSBS, don't hesitate to open a PR.
As usual, we encourage you to try the latest QuestDB release and share your feedback with our Slack Community. You can also play with our live demo to see how fast it executes your queries. And, of course, contributions to our open-source database on GitHub are more than welcome.