OLAP vs Time-Series Databases: The SQL Perspective
When I started developing software, the database world was all about OLTP – Online Transactional Processing. These were the days of databases primarily handling the CRUD operations: Create, Read, Update, and Delete. SQL was king, universally used despite the subtle variations across platforms.
Back then, only large corporations could afford Online Analytical Processing (OLAP). They used expensive Data Warehouses to ingest and analyze historical data, which was rarely updated and typically deleted in bulk. Since there was no standard language for OLAP, each system often invented its own.
But things changed in the past 10 years. The database landscape exploded; every OLAP system adopted SQL as its query language, computation became much cheaper, and now almost every database is either open source or offers a generous free-tier. So, today the question isn't whether my OLAP database supports SQL or if I can afford it. Instead, it's about whether it supports my business use case and how its performance stacks up relative to the money invested in infrastructure.
In this article, I'll show you how SQL is used across various OLAP databases for time-series analytics. You'll see how some databases have adapted their SQL extensions specifically for time-series scenarios. We're going to dive into query comparisons in QuestDB, TimeScale, DuckDB, and ClickHouse, and I’m also including PostgreSQL in the mix to offer a perspective on how it compares to these more specialized databases.
Time-series queries in a nutshell
Time-series analytics involve handling large data volumes with some typical patterns:
- Recent individual rows may provide specific insights, whereas older data is typically more useful when aggregated to reveal broader trends.
- Filtering by time intervals is a common requirement, and it's often necessary to compare the same intervals over different date ranges.
- Resampling data at various time resolutions is a frequent task. It's about readjusting the timeframe of data, aligning points within the same or across different tables, and addressing any data gaps.
Let's see how some common time-series queries can be executed on the different databases.
Latest Record Query
Imagine you're ingesting financial tick data, tracking various trading symbols across multiple markets and time zones. A typical challenge is retrieving the most recent row for each symbol, a quintessential time-series query.
Click on the tabs below to see the difference across different database engines.
- QuestDB
- DuckDB
- PostgreSQL, Timescale, and Clickhouse
-- QuestDB implements the LATEST ON ... PARTITION BY extensionSELECT * FROM tradesLATEST ON timestamp PARTITION BY symbol, side;
Interestingly, only QuestDB has
a specific SQL extension for
this use case. In other databases, we can employ a Window Function to implement
a row counter for each symbol and side, filtering only the first one. An
alternative might be using a GROUP BY
clause with the first_value
aggregation function. However, this approach becomes less readable with more
columns. Notably, the standard SQL queries consume more resources than QuestDB's
optimized implementation, which fetches
only the most recent row
for each unique combination of provided columns.
Time-Interval Filtering
A fundamental aspect of exploring time-series data is applying time filters. Typically, analysts start by examining a broad time interval and then progressively focus on more granular timestamps—or sometimes, they do the reverse, It's also often crucial to examine consistent time slices across different date ranges. Let's see five queries around this topic.
- QuestDB
- PostgreSQL and Timescale
- DuckDB and Clickhouse
-- QuestDB offers the IN extension for time-intervalsSELECT * FROM trades WHERE timestamp in '2023'; -- whole yearSELECT * FROM trades WHERE timestamp in '2023-12'; -- whole monthSELECT * FROM trades WHERE timestamp in '2023-12-20'; -- whole day-- The whole day, extending 15s into the next daySELECT * FROM trades WHERE timestamp in '2023-12-20;15s';-- For the past 7 days, 2 seconds before and after midnightSELECT * from trades WHERE timestamp in '2023-09-20T23:59:58;4s;-1d;7'
You can see how having native extensions for time-range filters can simplify
your SQL. In databases without specific extensions, we might have utilized
time_diff
and interval
functions, but the resulting SQL would still be more
complex than just using a specialized operator.
Joining Tables by Approximate Time
In analytics, while we often work with denormalized tables, joining them with other tables can yield insightful results. Joining on common columns is straightforward, but joining based on time can be far more insightful. However, this is complicated by the fact that events seldom occur at precisely the same microsecond. Issues like clock calibration discrepancies or network latencies can lead to slightly different timestamps in different tables. Moreover, there might be scenarios where one table logs data every second, and another does so every 15 minutes, yet you still need to join them to construct a coherent snapshot of a specific moment in time.
Most of the analytical databases we are examining support the
ASOF JOIN
. This join type,
for a given row in one table, finds a matching row in another table that
occurred at the same exact moment or, if not available, the closest preceding
entry. For databases lacking native ASOF JOIN
support, a similar result can be
achieved using a lateral join.
- QuestDB
- PostgreSQL and Timescale
- DuckDB
- Clickhouse
-- QuestDB provides ASOF JOIN. Since table definition includes-- the designated timestamp column, no time condition is neededSELECT t.*, n.*FROM trades ASOF JOIN news ON (symbol);
In this scenario, QuestDB, DuckDB, and Clickhouse stand out for their
developer-friendliness, extending SQL to accommodate this common requirement.
However, there's a nuance with Clickhouse: its ASOF JOIN
does not allow using
the timestamp column as the sole condition. This limitation works fine for our
example but might pose challenges if you need to match all rows in one table
with those in another based solely on approximate timestamps.
Time Interval Grouping/Downsampling
Running aggregations over specific time intervals is another fundamental query type in time-series analytics. For instance, analysts might ask:
- How many rows are there per second?
- Are there any trends in trading volume per minute over the last few hours?
- What's the average order amount per country and category on a monthly basis over the last year?
Let's calculate the total price and volume in our trading table at 15 minutes intervals.
- QuestDB
- PostgreSQL
- Timescale
- DuckDB
- Clickhouse
-- QuestDB implements the SAMPLE BY extension-- which accepts granularity from years to microseconds.-- GROUP BY and ORDER BY are implicit and not neededSELECTtimestamp, symbol,sum(price) AS price,sum(amount) AS volumeFROM tradesSAMPLE BY 15m;
All databases include some support for classifying a timestamp into a time
bucket. Most of them then use the standard GROUP BY
and ORDER BY
so you can
execute aggregations. One thing I like about the QuestDB
SAMPLE BY
extension is
that grouping and sorting by timestamp are implicit, making my SQL a bit easier
to write.
Time Interval Downsampling with linear interpolation
Wouldn't life be great if our data always arrived at regular predictable intervals? Reality often presents us with irregular time intervals, complete with gaps In such cases, it's crucial not only to identify these gaps but also to consider interpolating results at these points using linear interpolation. This approach ensures a uniform dataset without gaps, which is critical when training machine learning models or displaying results on a business dashboard.
- QuestDB
- PostgreSQL
- Timescale
- DuckDB
- Clickhouse
-- QuestDB SAMPLE BY accepts FILL with different strategies,-- including LINEAR, PREVious row value, NULL, or Literal valueSELECTtimestamp,sum(price) AS price,sum(amount) AS volumeFROM tradesSAMPLE BY 1s FILL(LINEAR);
QuestDB and Timescale both offer extensions for linear interpolation, leading to queries that are quite similar to those in the previous section—a convenient consistency. On the other hand, DuckDB, Clickhouse, and PostgreSQL, not being specifically designed for time-series, lack such extensions. For PostgreSQL and DuckDB, one must first create a time-series virtual table, then perform a join, and use Window Functions for the interpolation calculations. Clickhouse does provide some native interpolation capabilities, but these are limited to values based on the previous row only, necessitating the use of Window Functions for more complex calculations.
Conclusion
As we have explored, SQL is not a one-size-fits-all language; its effectiveness can vary significantly depending on the database and its specific extensions. A database like QuestDB, with tailored SQL extensions for particular use cases, notably improves the developer experience by reducing friction. This ease of use translates into greater productivity, less frustration, and more time for analysts and developers to engage in interactive exploration. Such an environment fosters deeper insights and, ultimately, can lead to more informed business decisions.
Moreover, QuestDB's SQL extensions are not just about syntax convenience; they are backed by optimizations specifically designed for the unique characteristics and volumes of time-series data. This focus on efficiency is a key factor in QuestDB's consistent top performance in industry benchmarks.
Closing Notes
For those already familiar with SQL, adapting to QuestDB's extensions should be straightforward, especially once you grasp the fundamental concepts underpinning QuestDB. If you're new to SQL and eager to dive into time-series analytics, starting with QuestDB offers a smoother learning curve. You'll spend less time grappling with complex syntax and more time acquiring practical skills.
You can try QuestDB's SQL extensions at our public demo or downloading QuestDB Open Source .