Server configuration may be applied when ingesting data over InfluxDB Line Protocol (ILP) to allow user control on how the system processes and commits late-arriving data for optimum throughput.
As of software version 6.0, QuestDB adds support for out-of-order (O3) data ingestion. The skew and latency of out-of-order data are likely to be relatively constant, so users may configure ingestion based on the characteristics of the data.
Most real-time out-of-order data patterns are caused by the delivery mechanism and hardware jitter, and therefore the timestamp distribution will be contained by some boundary.
If any new timestamp value has a high probability to arrive within 10 seconds of
the previously received value, the boundary for this data is
10 seconds and we
name this lag.
When the order of timestamp values follow this pattern, it will be recognized by our out-of-order algorithm and prioritized using an optimized processing path. A commit of this data re-orders uncommitted rows and then commits all rows up to the boundary; the remaining rows stay in memory to be committed later.
Commit parameters allow for specifying that commits of out-of-order data should occur when:
- they are outside a window of time for which they are expected to be out-of-order or
- when the row-count passes a certain threshold.
Commit parameters are user-configurable for ingestion using InfluxDB line protocol only. This is the case as commits over Postgres wire protocol are invoked client-side and commits via REST API occur either row-by-row or after a CSV import is complete.
The following server configuration parameters are user-configurable:
These parameters are enforced so that commits occur if any one of these conditions are met, therefore out-of-order commits occur based on the age of out-of-order records or by record count.
An out-of-order commit will occur:
- if records haven't been committed for
If a commit occurs due to
cairo.max.uncommitted.rows being reached, then
cairo.commit.lag will be applied.
The defaults for the out-of-order algorithm are optimized for real-world usage and should cover most patterns for timestamp arrival. The default configuration is as follows:
Users should modify out-of-order parameters if there is a known or expected pattern for:
- The length of time by which most records are late
- The frequency of incoming records and the expected throughput
For optimal ingestion performance, the number of commits of out-of-order data should be minimized. For this reason, if throughput is low and timestamps are expected to be consistently delayed up to thirty seconds, the following configuration settings can be applied
For high-throughput scenarios, lower commit timer and larger number of uncommitted rows may be more appropriate. The following settings would assume a throughput of ten thousand records per second with a likely maximum of 1 second lateness for timestamp values:
These settings may be applied via server configuration file:
As with other server configuration parameters, these settings may be passed as environment variables:
To set this configuration for the current shell:
Passing the environment variables via Docker is done using the
It's possible to set out-of-order values per table when creating a new table as
part of the
PARTITION BY clause. Configuration is passed using the
keyword with the following two parameters:
maxUncommittedRows- equivalent to
commitLag- equivalent to
Checking the values per-table may be done using the
The values can changed per each table with:
INSERT keyword may be passed parameters for handling the expected lag of
out-of-order records and a batch size for the number of rows to process and
insert at once. The following query shows an
INSERT AS SELECT operation with
lag and batch size applied:
For more information on using
INSERT statements with parameters, see the
INSERT parameters documentation.
Using the lag and batch size parameters during
INSERT AS SELECT statements is
a convenient strategy to load and order large datasets from CSV in bulk. This
strategy along with an example workflow is described in the
importing data guide.