SAMPLE BY keyword

SAMPLE BY is used on time series data to summarize large datasets into aggregates of homogeneous time chunks as part of a SELECT statement. Users performing SAMPLE BY queries on datasets with missing data may make use of the FILL keyword to specify a fill behavior.

note

To use SAMPLE BY, one column needs to be designated as timestamp. Find out more in the designated timestamp section.

Syntax#

Flow chart showing the syntax of the SAMPLE BY keyword

Where SAMPLE_SIZE is the unit of time by which you wish to aggregate your results, and n is the number of time chunks that will be summarized together.

Sample calculation#

The time range of each group sampled by time is an absolute value, in other words, sampling by one day is a 24 hour range which is not bound to actual calendar dates.

Considering the following example which samples from a sensors table over the last 24 hours:

SELECT ts, count()
FROM sensors
WHERE ts > dateadd('d', -1, now())
SAMPLE BY 1d

The WHERE clause has narrowed down results to those which have a timestamp greater than 24 hours from now. If the table has rows for sensor readings ingested yesterday and today, this query will return two rows. The 24 hour range for the sampled group starts at the first-returned timestamp:

tscount
2021-02-03T00:32:35.000000Z1000
2021-02-04T00:32:35.000000Z200

Examples#

Assume the following table

tsbuysellquantityprice
ts1Bq1p1
ts2Sq2p2
ts3Sq3p3
............
tsnBqnpn

The following will return the number of trades per hour:

trades - hourly interval
SELECT ts, count()
FROM TRADES
SAMPLE BY 1h;

The following will return the trade volume in 30 minute intervals

trades - 30 minute interval
SELECT ts, sum(quantity*price)
FROM TRADES
SAMPLE BY 30m;

The following will return the average trade notional (where notional is = q * p) by day:

trades - daily interval
SELECT ts, avg(quantity*price)
FROM TRADES
SAMPLE BY 1d;