SAMPLE BY is used on time series data to summarize large datasets into
aggregates of homogeneous time chunks as part of a
SELECT statement. Users performing
queries on datasets with missing data may make use of the
FILL keyword to specify a fill behavior.
SAMPLE BY, one column needs to be designated as
timestamp. Find out
more in the designated timestamp section.
SAMPLE_SIZE is the unit of time by which you wish to aggregate your
n is the number of time chunks that will be summarized together.
The time range of each group sampled by time is an absolute value, in other words, sampling by one day is a 24 hour range which is not bound to actual calendar dates.
Considering the following example which samples from a
sensors table over the
last 24 hours:
WHERE clause has narrowed down results to those which have a timestamp
greater than 24 hours from now. If the table has rows for sensor readings
ingested yesterday and today, this query will return two rows. The 24 hour range
for the sampled group starts at the first-returned timestamp:
Assume the following table
The following will return the number of trades per hour:
The following will return the trade volume in 30 minute intervals
The following will return the average trade notional (where notional is = q * p) by day: