# Aggregate functions

This page describes the available functions to assist with performing aggregate calculations.

## approx_percentile#

`approx_percentile(value, percentile, precision)` calculates the approximate value for the given non-negative column and percentile using the HdrHistogram algorithm.

### Arguments#

• `value` is any numeric non-negative value.
• `percentile` is a `double` value between 0.0 and 1.0, inclusive.
• `precision` is an optional `int` value between 0 and 5, inclusive. This is the number of significant decimal digits to which the histogram will maintain value resolution and separation. For example, when the input column contains integer values between 0 and 3,600,000,000 and the precision is set to 3, value quantization within the range will be no larger than 1/1,000th (or 0.1%) of any value. In this example, the function tracks and analyzes the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At its maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).

### Return value#

Return value type is `double`.

### Examples#

Approximate percentile
SELECT approx_percentile(latency, 0.99) FROM request_logs;
approx_percentile
101.5

## avg#

`avg(value)` calculates simple average of values ignoring missing data (e.g `null` values).

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

Average transaction amount
SELECT avg(amount) FROM transactions;
avg
22.4
Average transaction amount by payment_type
SELECT payment_type, avg(amount) FROM transactions;
payment_typeavg
cash22.1
card27.4
null18.02

## count#

• `count()` or `count(*)` - counts the number of rows irrespective of underlying data.
• `count(column_name)` - counts the number of non-null values in a given column.

### Arguments#

• `count()` does not require arguments.
• `count(column_name)` - supports the following data types:
• `double`
• `float`
• `integer`
• `character`
• `short`
• `byte`
• `timestamp`
• `date`
• `long`
• `long256`
• `geohash`
• `string`
• `symbol`

### Return value#

Return value type is `long`.

### Examples#

Count of rows in the `transactions` table:

SELECT count() FROM transactions;
count
100

Count of rows in the `transactions` table aggregated by the `payment_type` value:

SELECT payment_type, count() FROM transactions;
payment_typecount
cash25
card70
null5

Count non-null transaction amounts:

SELECT count(amount) FROM transactions;
count
95

Count non-null transaction amounts by `payment_type`:

SELECT payment_type, count(amount) FROM transactions;
payment_typecount
cash24
card67
null4
##### note

`null` values are aggregated with `count()`, but not with `count(column_name)`

## count_distinct#

`count_distinct(column_name)` - counts distinct values in `string`, `symbol`, `long256`, `long`, or `int` columns.

### Return value#

Return value type is `long`.

### Examples#

• Count of distinct sides in the transactions table. Side column can either be `BUY` or `SELL` or `null`
SELECT count_distinct(side) FROM transactions;
count_distinct
2
• Count of distinct counterparties in the transactions table aggregated by `payment_type` value.
SELECT payment_type, count_distinct(counterparty) FROM transactions;
payment_typecount_distinct
cash3
card23
null5
##### note

`null` values are not counted in the `count_distinct` function.

## first/last#

• `first(column_name)` - returns the first value of a column.
• `last(column_name)` - returns the last value of a column.

Supported column datatype: `double`, `float`, `integer`, `IPv4`, `character`, `short`, `byte`, `timestamp`, `date`, `long`, `geohash`, `symbol`, `uuid`.

If a table has a designated timestamp, then the first row is always the row with the lowest timestamp (oldest) and the last row is always the one with the highest (latest) timestamp. For a table without a designated timestamp column, `first` returns the first row and `last` returns the last inserted row, regardless of any timestamp column.

### Return value#

Return value type is `string`.

### Examples#

Given a table `sensors`, which has a designated timestamp column:

device_idtemperaturets
arduino-01122021-06-02T14:33:19.970258Z
arduino-02102021-06-02T14:33:21.703934Z
arduino-03182021-06-02T14:33:23.707013Z

The following query returns oldest value for the `device_id` column:

SELECT first(device_id) FROM sensors;
first
arduino-01

The following query returns the latest symbol value for the `device_id` column:

SELECT last(device_id) FROM sensors;
last
arduino-03

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table `sensors_unordered`:

device_idtemperaturets
arduino-01122021-06-02T14:33:19.970258Z
arduino-03182021-06-02T14:33:23.707013Z
arduino-02102021-06-02T14:33:21.703934Z

The following query returns the first record for the `device_id` column:

SELECT first(device_id) FROM sensors_unordered;
first
arduino-01

The following query returns the last record for the `device_id` column:

SELECT last(device_id) FROM sensors_unordered;
last
arduino-02

## first_not_null#

• `first_not_null(column_name)` - returns the first non-null value of a column.

Supported column datatype: `double`, `float`, `integer`, `IPv4`, `char`, `short`, `byte`, `timestamp`, `date`, `long`, `geohash`, `symbol` and `uuid`.

If a table has a designated timestamp, then the first non-null row is always the row with the lowest timestamp (oldest). For a table without a designated timestamp column, `first_not_null` returns the first non-null row, regardless of any timestamp column.

### Return value#

Return value type is `string`.

### Examples#

Given a table `sensors`, which has a designated timestamp column:

device_idtemperaturets
null122021-06-02T14:33:19.970258Z
arduino-02102021-06-02T14:33:21.703934Z
arduino-03182021-06-02T14:33:23.707013Z

The following query returns oldest non-null value for the device_id column:

SELECT first_not_null(device_id) FROM sensors;
first_not_null
arduino-02

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table `sensors_unordered`:

device_idtemperaturets
null122021-06-02T14:33:19.970258Z
arduino-03182021-06-02T14:33:23.707013Z
arduino-02102021-06-02T14:33:21.703934Z

The following query returns the first non-null record for the device_id column:

SELECT first_not_null(device_id) FROM sensors_unordered;
first_not_null
arduino-03

## last_not_null#

• `last_not_null(column_name)` - returns the last non-null value of a column.

Supported column datatype: `double`, `float`, `integer`, `IPv4`, `char`, `short`, `byte`, `timestamp`, `date`, `long`, `geohash`, `symbol` and `uuid`.

If a table has a designated timestamp, then the last non-null row is always the row with the highest timestamp (most recent). For a table without a designated timestamp column, `last_not_null` returns the last non-null row, regardless of any timestamp column.

### Return value#

Return value type is `string`.

### Examples#

Given a table `sensors`, which has a designated timestamp column:

device_idtemperaturets
null122021-06-02T14:33:19.970258Z
arduino-02102021-06-02T14:33:21.703934Z
arduino-03182021-06-02T14:33:23.707013Z

The following query returns most recent non-null value for the device_id column:

SELECT last_not_null(device_id) FROM sensors;
last_not_null
arduino-03

Without selecting a designated timestamp column, the table may be unordered and the query may return different result. Given an unordered table `sensors_unordered`:

device_idtemperaturets
null122021-06-02T14:33:19.970258Z
arduino-03182021-06-02T14:33:23.707013Z
arduino-02102021-06-02T14:33:21.703934Z

The following query returns the last non-null record for the `device_id` column:

SELECT last_not_null(device_id) FROM sensors_unordered;
last_not_null
arduino-02

## haversine_dist_deg#

`haversine_dist_deg(lat, lon, ts)` - calculates the traveled distance for a series of latitude and longitude points.

### Arguments#

• `lat` is the latitude expressed as degrees in decimal format (`double`)
• `lon` is the longitude expressed as degrees in decimal format (`double`)
• `ts` is the `timestamp` for the data point

### Return value#

Return value type is `double`.

### Examples#

Calculate the aggregate traveled distance for each car_id
SELECT car_id, haversine_dist_deg(lat, lon, k)
FROM table rides

## ksum#

`ksum(value)` - adds values ignoring missing data (e.g `null` values). Values are added using the

Kahan compensated sum algorithm. This is only beneficial for floating-point values such as `float` or `double`.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is the same as the type of the argument.

### Examples#

SELECT ksum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));
ksum
52.79143968514029

## max#

`max(value)` - returns the highest value ignoring missing data (e.g `null` values).

### Arguments#

• `value` is any numeric or string value

### Return value#

Return value type is the same as the type of the argument.

### Examples#

Highest transaction amount
SELECT max(amount) FROM transactions;
max
55.3
Highest transaction amount by payment_type
SELECT payment_type, max(amount) FROM transactions;
payment_typeamount
cash31.5
card55.3
null29.2

## min#

`min(value)` - returns the lowest value ignoring missing data (e.g `null` values).

### Arguments#

• `value` is any numeric or string value

### Return value#

Return value type is the same as the type of the argument.

### Examples#

Lowest transaction amount
SELECT min(amount) FROM transactions;
min
12.5
Lowest transaction amount, by payment_type
SELECT payment_type, min(amount) FROM transactions;
payment_typemin
cash12.5
card15.3
null22.2

## nsum#

`nsum(value)` - adds values ignoring missing data (e.g `null` values). Values are added using the Neumaier sum algorithm. This is only beneficial for floating-point values such as `float` or `double`.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

SELECT nsum(a)
FROM (SELECT rnd_double() a FROM long_sequence(100));
nsum
49.5442334742831

## stddev / stddev_samp#

`stddev_samp(value)` - Calculates the sample standard deviation of a set of values, ignoring missing data (e.g., null values). The sample standard deviation is a measure of the amount of variation or dispersion in a sample of a population. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

`stddev` is an alias for `stddev_samp`.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

SELECT stddev_samp(x)
FROM (SELECT x FROM long_sequence(100));
stddev_samp
29.011491975882

## stddev_pop#

`stddev_pop(value)` - Calculates the population standard deviation of a set of values. The population standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

SELECT stddev_pop(x)
FROM (SELECT x FROM long_sequence(100));
stddev_samp
28.86607004772212

## string_agg#

`string_agg(value, delimiter)` - Concatenates the given string values into a single string with the delimiter used as a value separator.

### Arguments#

• `value` is a `string` value.
• `delimiter` is a `char` value.

### Return value#

Return value type is `string`.

### Examples#

SELECT string_agg(x::string, ',')
FROM (SELECT x FROM long_sequence(5));
string_agg
1,2,3,4,5

## sum#

`sum(value)` - adds values ignoring missing data (e.g `null` values).

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is the same as the type of the argument.

### Examples#

Sum all quantities in the transactions table
SELECT sum(quantity) FROM transactions;
sum
100
Sum all quantities in the transactions table, aggregated by item
SELECT item, sum(quantity) FROM transactions;
itemcount
apple53
orange47

### Overflow#

`sum` does not perform overflow check. To avoid overflow, you can cast the argument to wider type.

Cast as long to avoid overflow
SELECT sum(cast(a AS LONG)) FROM table;

## variance / var_samp#

`var_samp(value)` - Calculates the sample variance of a set of values. The sample variance is a measure of the amount of variation or dispersion of a set of values in a sample from a population. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

`variance()` is an alias for `var_samp`.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

SELECT var_samp(x)
FROM (SELECT x FROM long_sequence(100));
stddev_samp
841.666666666666

## var_pop#

`var_pop(value)` - Calculates the population variance of a set of values. The population variance is a measure of the amount of variation or dispersion of a set of values. A low variance indicates that the values tend to be very close to the mean, while a high variance indicates that the values are spread out over a wider range.

### Arguments#

• `value` is any numeric value.

### Return value#

Return value type is `double`.

### Examples#

SELECT var_pop(x)
FROM (SELECT x FROM long_sequence(100));
stddev_samp
833.25

## vwap#

`vwap(price, quantity)` - Calculates the volume-weighted average price (VWAP) based on the given price and quantity columns. This is a handy replacement for the `sum(price * quantity) / sum(quantity)` expression.

### Arguments#

• `price` is any numeric price value.
• `quantity` is any numeric quantity value.

### Return value#

Return value type is `double`.

### Examples#

SELECT vwap(x, x)
FROM (SELECT x FROM long_sequence(100));
vwap
67