Interval Scan
When a query includes a condition on the designated timestamp, QuestDB performs an Interval Scan.
#
How Interval Scan worksThis process involves:
- Analyzing the condition: QuestDB examines the query to identify the conditions applied to the designated timestamp.
- Extracting a list of timestamp intervals: Based on the condition, QuestDB determines the specific intervals of time that need to be scanned.
- Performing a binary search for each interval's scan boundaries in the designated timestamp column: For each identified interval, QuestDB uses a binary search to quickly find the start and end points of the interval in the timestamp column. A binary search is a fast search algorithm that finds the position of a target value within a sorted array, which in this case is a sorted timestamp column.
- Scanning table data only within the found boundaries: QuestDB then scans only the rows of the table that fall within these boundaries, significantly reducing the amount of data that needs to be processed.
The Interval Scan is possible because tables with a designated timestamp store data in timestamp order. This allows QuestDB to efficiently skip over data that falls outside the relevant intervals. However, it's important to note that Interval Scan does not apply to the results of sub-queries, as the data returned from a sub-query is not guaranteed to be in timestamp order.
#
EXPLAIN Interval ScanYou can determine whether an Interval Scan is used to execute a query using the EXPLAIN command.
For example, consider the trades
table with a timestamp
designated
timestamp. The following query:
Produces this query plan:
QUERY PLAN |
---|
DataFrame |
Row forward scan |
Interval forward scan on: trades |
intervals: [("2023-01-20T00:00:00.000000Z","2023-01-20T23:59:59.999999Z")] |
The query optimizer reduces scanning to a single interval related to the
2023-01-20
day.
#
ExamplesThe following three queries all produce the same Interval Scan plan because
they all specify the same time range for the timestamp
column, just in
different ways:
The Interval Scan plan looks like this:
QUERY PLAN |
---|
DataFrame |
Row forward scan |
Interval forward scan on: trades |
intervals: [("2023-01-20T00:00:00.000000Z","2023-01-20T23:59:59.999999Z")] |
If need to scan more than one interval, you can use the timestamp IN operator:
This query results in an Interval Scan plan that includes two intervals:
QUERY PLAN |
---|
DataFrame |
Row forward scan |
Interval forward scan on: trades |
intervals: [(2023-01-01T00:00:00.000000Z,2023-01-02T23:59:59.999999Z), (2024-01-01T00:00:00.000000Z,2024-01-02T23:59:59.999999Z)] |
The table scan is limited to these two intervals:
<2023-01-01T00:00:00.000000Z,2023-01-02T23:59:59.999999Z>
<2024-01-01T00:00:00.000000Z,2024-01-02T23:59:59.999999Z>
If a table doesn't have a designated timestamp, you can declare one using the
timestamp(columnName)
function.
For example, the following query results in a full scan with an Async Filter, which is a process that scans the entire table without taking advantage of the designated timestamp:
However, if you declare a designated timestamp:
It results in an Interval Forward Scan.
Note that declaring a designated timestamp only works if the data is truly ordered. For example, if data are sorted in ascending order by the timestamp. Otherwise the result is undefined, meaning that the query may not return the expected results.