Server configuration may be applied when ingesting data over InfluxDB Line Protocol (ILP) to allow user control on how the system processes and commits late-arriving data for optimum throughput.
As of software version 6.0, QuestDB adds support for out-of-order (O3) data ingestion. The skew and latency of out-of-order data are likely to be relatively constant, so users may configure ingestion based on the characteristics of the data.
Most real-time out-of-order data patterns are caused by the delivery mechanism and hardware jitter, and therefore the timestamp distribution will be contained by some boundary.
If any new timestamp value has a high probability to arrive within 10 seconds of
the previously received value, the boundary for this data is
10 seconds and we
name this lag.
When the order of timestamp values follow this pattern, it will be recognized by our out-of-order algorithm and prioritized using an optimized processing path. A commit of this data re-orders uncommitted rows and then commits all rows up to the boundary; the remaining rows stay in memory to be committed later.
Commit parameters allow for specifying that commits of out-of-order data should occur when:
- they are outside a window of time for which they are expected to be out-of-order or
- when the row-count passes a certain threshold.
Commit parameters are user-configurable for ingestion using InfluxDB line protocol only. This is the case as commits over Postgres wire protocol are invoked client-side and commits via REST API occur either row-by-row or after a CSV import is complete.
The following server configuration parameters are user-configurable:
These parameters are enforced so that commits occur if any one of these conditions are met, therefore out-of-order commits occur based on the age of out-of-order records or by record count.
An out-of-order commit will occur:
- if records haven't been committed for
If a commit occurs due to
cairo.max.uncommitted.rows being reached, then
cairo.commit.lag will be applied.
The defaults for the out-of-order algorithm are optimized for real-world usage and should cover most patterns for timestamp arrival. The default configuration is as follows:
Users should modify out-of-order parameters if there is a known or expected pattern for:
- The length of time by which most records are late
- The frequency of incoming records and the expected throughput
For optimal ingestion performance, the number of commits of out-of-order data should be minimized. For this reason, if throughput is low and timestamps are expected to be consistently delayed up to thirty seconds, the following configuration settings can be applied
For high-throughput scenarios, lower commit timer and larger number of uncommitted rows may be more appropriate. The following settings would assume a throughput of ten thousand records per second with a likely maximum of 1 second lateness for timestamp values:
These settings may be applied via server configuration file:
As with other server configuration parameters, these settings may be passed as environment variables:
To set this configuration for the current shell:
Passing the environment variables via Docker is done using the
It's possible to set out-of-order values per table when creating a new table as
part of the
PARTITION BY clause. Configuration is passed using the
keyword with the following two parameters:
maxUncommittedRows- equivalent to
commitLag- equivalent to
Checking the values per-table may be done using the
The values can changed per each table with:
It's also possible to set
maxUncommittedRows via REST API when
importing data via the
/imp endpoint. The following example imports a file
which contains out-of-order records. The
parameters must be provided for commit lag and max uncommitted rows to have
INSERT keyword may be
passed parameters for handling the
expected lag of out-of-order records and a batch size can be specified for
the number of rows to process and insert at once. The following query shows an
INSERT AS SELECT operation with lag and batch size applied:
Using the lag and batch size parameters during
INSERT AS SELECT statements is
a convenient strategy to load and order large datasets from CSV in bulk. This
strategy along with an example workflow is described in the
importing data guide.