OLAP vs Time-Series Databases: The SQL Perspective

QuestDB is the world's fast growing time-series database. It offers premium ingestion throughput, enhanced SQL analytics that can power through analysis, and cost-saving hardware efficiency. It's open source and integrates with many tools and languages.

When I started developing software, the database world was all about OLTP – Online Transactional Processing. These were the days of databases primarily handling the CRUD operations: Create, Read, Update, and Delete. SQL was king, universally used despite the subtle variations across platforms.

An artistic representation of SQL extensions for time-series for questdb, duckdb, clickhouse, timescale, and postgreSQL.

Back then, only large corporations could afford Online Analytical Processing (OLAP). They used expensive Data Warehouses to ingest and analyze historical data, which was rarely updated and typically deleted in bulk. Since there was no standard language for OLAP, each system often invented its own.

But things changed in the past 10 years. The database landscape exploded; every OLAP system adopted SQL as its query language, computation became much cheaper, and now almost every database is either open source or offers a generous free-tier. So, today the question isn't whether my OLAP database supports SQL or if I can afford it. Instead, it's about whether it supports my business use case and how its performance stacks up relative to the money invested in infrastructure.

In this article, I'll show you how SQL is used across various OLAP databases for time-series analytics. You'll see how some databases have adapted their SQL extensions specifically for time-series scenarios. We're going to dive into query comparisons in QuestDB, TimeScale, DuckDB, and ClickHouse, and I’m also including PostgreSQL in the mix to offer a perspective on how it compares to these more specialized databases.

Time-series queries in a nutshell

Time-series analytics involve handling large data volumes with some typical patterns:

  • Recent individual rows may provide specific insights, whereas older data is typically more useful when aggregated to reveal broader trends.
  • Filtering by time intervals is a common requirement, and it's often necessary to compare the same intervals over different date ranges.
  • Resampling data at various time resolutions is a frequent task. It's about readjusting the timeframe of data, aligning points within the same or across different tables, and addressing any data gaps.

Let's see how some common time-series queries can be executed on the different databases.

Latest Record Query

Imagine you're ingesting financial tick data, tracking various trading symbols across multiple markets and time zones. A typical challenge is retrieving the most recent row for each symbol, a quintessential time-series query.

Click on the tabs below to see the difference across different database engines.

-- QuestDB implements the LATEST ON ... PARTITION BY extension
SELECT * FROM trades
LATEST ON timestamp PARTITION BY symbol, side;

Interestingly, only QuestDB has a specific SQL extension for this use case. In other databases, we can employ a Window Function to implement a row counter for each symbol and side, filtering only the first one. An alternative might be using a GROUP BY clause with the first_value aggregation function. However, this approach becomes less readable with more columns. Notably, the standard SQL queries consume more resources than QuestDB's optimized implementation, which fetches only the most recent row for each unique combination of provided columns.

Time-Interval Filtering

A fundamental aspect of exploring time-series data is applying time filters. Typically, analysts start by examining a broad time interval and then progressively focus on more granular timestamps—or sometimes, they do the reverse, It's also often crucial to examine consistent time slices across different date ranges. Let's see five queries around this topic.

-- QuestDB offers the IN extension for time-intervals
SELECT * FROM trades WHERE timestamp in '2023'; -- whole year
SELECT * FROM trades WHERE timestamp in '2023-12'; -- whole month
SELECT * FROM trades WHERE timestamp in '2023-12-20'; -- whole day
-- The whole day, extending 15s into the next day
SELECT * FROM trades WHERE timestamp in '2023-12-20;15s';
-- For the past 7 days, 2 seconds before and after midnight
SELECT * from trades WHERE timestamp in '2023-09-20T23:59:58;4s;-1d;7'

You can see how having native extensions for time-range filters can simplify your SQL. In databases without specific extensions, we might have utilized time_diff and interval functions, but the resulting SQL would still be more complex than just using a specialized operator.

Joining Tables by Approximate Time

In analytics, while we often work with denormalized tables, joining them with other tables can yield insightful results. Joining on common columns is straightforward, but joining based on time can be far more insightful. However, this is complicated by the fact that events seldom occur at precisely the same microsecond. Issues like clock calibration discrepancies or network latencies can lead to slightly different timestamps in different tables. Moreover, there might be scenarios where one table logs data every second, and another does so every 15 minutes, yet you still need to join them to construct a coherent snapshot of a specific moment in time.

Most of the analytical databases we are examining support the ASOF JOIN. This join type, for a given row in one table, finds a matching row in another table that occurred at the same exact moment or, if not available, the closest preceding entry. For databases lacking native ASOF JOIN support, a similar result can be achieved using a lateral join.

-- QuestDB provides ASOF JOIN. Since table definition includes
-- the designated timestamp column, no time condition is needed
SELECT t.*, n.*
FROM trades ASOF JOIN news ON (symbol);

In this scenario, QuestDB, DuckDB, and Clickhouse stand out for their developer-friendliness, extending SQL to accommodate this common requirement. However, there's a nuance with Clickhouse: its ASOF JOIN does not allow using the timestamp column as the sole condition. This limitation works fine for our example but might pose challenges if you need to match all rows in one table with those in another based solely on approximate timestamps.

Time Interval Grouping/Downsampling

Running aggregations over specific time intervals is another fundamental query type in time-series analytics. For instance, analysts might ask:

  • How many rows are there per second?
  • Are there any trends in trading volume per minute over the last few hours?
  • What's the average order amount per country and category on a monthly basis over the last year?

Let's calculate the total price and volume in our trading table at 15 minutes intervals.

-- QuestDB implements the SAMPLE BY extension
-- which accepts granularity from years to microseconds.
-- GROUP BY and ORDER BY are implicit and not needed
SELECT
timestamp, symbol,
sum(price) AS price,
sum(amount) AS volume
FROM trades
SAMPLE BY 15m;

All databases include some support for classifying a timestamp into a time bucket. Most of them then use the standard GROUP BY and ORDER BY so you can execute aggregations. One thing I like about the QuestDB SAMPLE BY extension is that grouping and sorting by timestamp are implicit, making my SQL a bit easier to write.

Time Interval Downsampling with linear interpolation

Wouldn't life be great if our data always arrived at regular predictable intervals? Reality often presents us with irregular time intervals, complete with gaps In such cases, it's crucial not only to identify these gaps but also to consider interpolating results at these points using linear interpolation. This approach ensures a uniform dataset without gaps, which is critical when training machine learning models or displaying results on a business dashboard.

-- QuestDB SAMPLE BY accepts FILL with different strategies,
-- including LINEAR, PREVious row value, NULL, or Literal value
SELECT
timestamp,
sum(price) AS price,
sum(amount) AS volume
FROM trades
SAMPLE BY 1s FILL(LINEAR);

QuestDB and Timescale both offer extensions for linear interpolation, leading to queries that are quite similar to those in the previous section—a convenient consistency. On the other hand, DuckDB, Clickhouse, and PostgreSQL, not being specifically designed for time-series, lack such extensions. For PostgreSQL and DuckDB, one must first create a time-series virtual table, then perform a join, and use Window Functions for the interpolation calculations. Clickhouse does provide some native interpolation capabilities, but these are limited to values based on the previous row only, necessitating the use of Window Functions for more complex calculations.

Conclusion

As we have explored, SQL is not a one-size-fits-all language; its effectiveness can vary significantly depending on the database and its specific extensions. A database like QuestDB, with tailored SQL extensions for particular use cases, notably improves the developer experience by reducing friction. This ease of use translates into greater productivity, less frustration, and more time for analysts and developers to engage in interactive exploration. Such an environment fosters deeper insights and, ultimately, can lead to more informed business decisions.

Moreover, QuestDB's SQL extensions are not just about syntax convenience; they are backed by optimizations specifically designed for the unique characteristics and volumes of time-series data. This focus on efficiency is a key factor in QuestDB's consistent top performance in industry benchmarks.

Closing Notes

For those already familiar with SQL, adapting to QuestDB's extensions should be straightforward, especially once you grasp the fundamental concepts underpinning QuestDB. If you're new to SQL and eager to dive into time-series analytics, starting with QuestDB offers a smoother learning curve. You'll spend less time grappling with complex syntax and more time acquiring practical skills.

You can try QuestDB's SQL extensions at our public demo or downloading QuestDB Open Source .

RedditHackerNewsX
Subscribe to our newsletters for the latest. Secure and never shared or sold.