SAMPLE BY keyword

SAMPLE BY is used on time series data to summarise large datasets into aggregates of homogeneous time chunks as part of a SELECT statement.

note

To use SAMPLE BY, one column needs to be designated as timestamp. Find out more in the designated timestamp section.

Syntax#

Flow chart showing the syntax of the SAMPLE BY keyword

Where SAMPLE_SIZE is the unit of time by which you wish to aggregate your results, and n is the number of time chunks that will be summarised together.

Examples#

Assume the following table

timestampbuysellquantityprice
ts1Bq1p1
ts2Sq2p2
ts3Sq3p3
............
tsnBqnpn

The following will return the number of trades per hour:

trades - hourly interval
SELECT timestamp, count()
FROM TRADES
SAMPLE BY 1h;

The following will return the trade volume in 30 minute intervals

trades - 30 minute interval
SELECT timestamp, sum(quantity*price)
FROM TRADES
SAMPLE BY 30m;

The following will return the average trade notional (where notional is = q * p) by day:

trades - daily interval
SELECT timestamp, avg(quantity*price)
FROM TRADES
SAMPLE BY 1d;