EXPLAIN keyword
EXPLAIN
displays the execution plan of an INSERT
, SELECT
, or UPDATE
statement.
Syntax
Description
A query execution plan shows how a statement will be implemented: which table is
going to be accessed and how, what join method are employed, and which
predicates are JIT-compiled etc. EXPLAIN
output is a tree of nodes containing
properties and subnodes (aka child nodes).
In a plan such as:
QUERY PLAN |
---|
Async JIT Filter |
filter: 100 < l |
workers: 1 |
DataFrame |
Row forward scan |
Frame forward scan on: tab |
there are:
- 4 nodes:
- Async JIT Filter
- DataFrame
- Row forward scan
- Frame forward scan
- 2 properties (both belong to Async JIT Filter node):
- filter
- workers
For simplicity, some nodes have special properties shown on the same line as
type; for example, Filter filter: b.age=10
or Limit lo: 10
.
The following list contains some plan node types:
Async Filter
- a parallelized filter that evaluates expressions with Java code. In certain scenarios, it also implements theLIMIT
keyword.Async JIT Filter
- a parallelized filter that evaluates expressions with Just-In-Time-compiled filter. In certain scenarios, it also implements theLIMIT
keyword.Interval forward
- scans one or more table data ranges based on the designated timestamp predicates. Scan endpoints are found via a binary search on timestamp column.CachedWindow
- container for window functions that copies data to memory and sorts it, e.g. row_number()Window
- container for window functions optimized for frames ordered by designated timestamp. Instead of copying the underlying dataset to memory it buffers just enough per-partition values to compute function result.Count
- returns the count of records in subnode.Cursor-order scan
- scans table records using row ids taken from an index, in index order - first all row ids linked to index value A, then B, etc.DataFrame
- full or partial table scan. It contains two children:- row cursor - which iterates over rows inside a frame (e.g.
Row forward scan
). - frame cursor - which iterates over table partitions or partition chunks
(e.g.
Frame forward scan
).
- row cursor - which iterates over rows inside a frame (e.g.
Filter
- standalone (non-JIT-compiled, non-parallelized) filter.Frame forward/backward scan
- scans table partitions in a specified direction.GroupBy
- group by with or without key(s). Ifvectorized
field showstrue
, then the node is parallelized and uses vectorized calculations.Hash
- subnode of this node is used to build a hash table that is later looked up (usually in aJOIN
clause but also applies toEXCEPT
orINTERSECT
).Index forward/backward scan
- scans all row ids associated with a givensymbol
value from start to finish or vice versa.Limit
- standalone node implementing theLIMIT
keyword. Other nodes can implementLIMIT
internally, e.g. theSort
node.Row forward/backward scan
- scans data frame (usually partitioned) records in a specified direction.Sort
- sorts data. If low or hi property is specified, then the sort buffer size is limited and a number of rows are skipped after sorting.SampleBy
-SAMPLE BY
keyword implementation. If thefill
is not shown, it meansfill(none)
.Selected Record
- used to reorder or rename columns. It does not do any significant processing on its own.Table-order scan
- scans table records using row ids taken from an index in table (physical) order - from the lowest to highest row id.VirtualRecord
- adds expressions to a subnode's columns.
Other node types should be easy to link to SQL and database concepts, e.g.
Except
, Hash Join
or Lt Join
.
Many nodes, especially join and sort, have 'light' and 'heavy' variants, e.g.
Hash Join Light
and Hash Join
. The former is used when child node(s) support
efficient random access lookups (e.g. DataFrame
) so storing row id in the
buffer is enough; otherwise, the whole record needs to be copied and the 'heavy'
factory is used.
Examples
To illustrate how EXPLAIN
works, consider the trades
table
in the QuestDB demo instance:
CREATE TABLE trades (
symbol SYMBOL CAPACITY 256 CACHE,
side SYMBOL CAPACITY 256 CACHE,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) TIMESTAMP (timestamp) PARTITION BY DAY
Using EXPLAIN
for the plan for SELECT
The following query highlight the plan for ORDER BY
for the table:
EXPLAIN SELECT * FROM trades ORDER BY ts DESC;
QUERY PLAN |
---|
DataFrame |
Row backward scan |
Frame backward scan on: trades |
The plan shows that no sort is required and the result is produced by scanning
the table backward. The scanning direction is possible because the data in the
trades
table is stored in timestamp order.
Now, let's check the plan for trades
with a simple filter:
EXPLAIN SELECT * FROM trades WHERE amount > 100.0;
QUERY PLAN |
---|
Async JIT Filter |
filter: 100.0 < amount |
workers: 1 |
DataFrame |
Row forward scan |
Frame forward scan on: trades |
In this example, the plan shows that the trades
table undergoes a full scan
(DataFrame
and subnodes) and the data is processed by the parallelized
JIT-compiled filter.
Using EXPLAIN
for the plan for CREATE
and INSERT
Apart from SELECT
, EXPLAIN
also works on CREATE
and INSERT
statements.
Single-row inserts are straightforward. The examples in this section show the
plan for more complicated CREATE
and INSERT
queries.
EXPLAIN CREATE TABLE trades AS
(
SELECT
rnd_symbol('a', 'b') symbol,
rnd_symbol('Buy', 'Sell') side,
rnd_double() price,
rnd_double() amount,
x::timestamp timestamp
FROM long_sequence(10)
) TIMESTAMP(timestamp) PARTITION BY DAY;
QUERY PLAN |
---|
Create table: trades |
VirtualRecord |
functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),rnd_double(),rnd_double(),x::timestamp] |
long_sequence count: 10 |
The plan above shows that the data is fetched from a long_sequence
cursor,
with random data generating functions called in VirtualRecord
.
The same applies to the following query:
EXPLAIN INSERT INTO trades
SELECT
rnd_symbol('a', 'b') symbol,
rnd_symbol('Buy', 'Sell') side,
rnd_double() price,
rnd_double() amount,
x::timestamp timestamp
FROM long_sequence(10);
QUERY PLAN |
---|
Insert into table: trades |
VirtualRecord |
functions: [rnd_symbol([a,b]),rnd_symbol([Buy,Sell]),rnd_double(),rnd_double(),x::timestamp] |
long_sequence count: 10 |
Of course, statements could be much more complex than that. Consider the
following UPDATE
query:
EXPLAIN UPDATE trades SET amount = 0 WHERE timestamp IN '2022-11-11';
QUERY PLAN |
---|
Update table: trades |
VirtualRecord |
functions: [0] |
DataFrame |
Row forward scan |
Interval forward scan on: trades |
intervals: [static=[1668124800000000,1668211199999999] |
The important bit here is Interval forward scan
. It means that the table is
forward scanned only between points designated by the
timestamp IN '2022-11-11'
predicate, that is between
2022-11-11 00:00:00,000000
and 2022-11-11 23:59:59,999999
(shown as raw
epoch micro values in the plan above). VirtualRecord
is only used to pass 0
constant for each row coming from DataFrame
.
Limitations:
To minimize resource usage, the EXPLAIN
command does not execute the
statement, to avoid paying a potentially large upfront cost for certain queries
(especially those involving hash join or sort).
EXPLAIN
provides a useful indication of the query execution, but it does not
guarantee to show the actual execution plan. This is because elements determined
during query runtime are missing.
While EXPLAIN
shows the number of workers that could be used by a parallelized
node it is only the upper limit. Depending on the data volume and system load, a
query can use fewer workers.
Under the hood, the plan nodes are called Factories
. Most plan nodes can be
mapped to implementation by adding the RecordCursorFactory
or
FrameCursorFactory
suffix, e.g.
DataFrame
->DataFrameRecordCursorFactory
Async JIT Filter
->AsyncJitFilteredRecordCursorFactory
SampleByFillNoneNotKeyed
->SampleByFillNoneNotKeyedRecordCursorFactory
while some are a bit harder to identify, e.g.GroupByRecord vectorized: false
->io.questdb.griffin.engine.groupby.GroupByRecordCursorFactory
GroupByRecord vectorized: true
->io.questdb.griffin.engine.groupby.vect.GroupByRecordCursorFactory
Other classes can be identified by searching for the node name in the toPlan()
methods.
See also
This section includes links to additional information such as tutorials: