Prometheus monitoring and alerting

Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores metrics as time-series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Users can measure the internal status of a QuestDB instance via an HTTP endpoint exposed by QuestDB at port 9003. This document describes how to enable metrics via this endpoint, how to configure Prometheus to scrape metrics from a QuestDB instance, and how to enable alerting from QuestDB to Prometheus Alertmanager.

Prerequisites#

Scraping Prometheus metrics from QuestDB#

QuestDB has a /metrics HTTP endpoint on port 9003 to expose Prometheus metrics. Before being able to query metrics, they must be enabled via the metrics.enabled key in server configuration:

/path/to/server.conf
metrics.enabled=true

When running QuestDB via Docker, port 9003 must be exposed and the metrics configuration can be enabled via the QDB_METRICS_ENABLED environment variable:

Docker
docker run \
-e QDB_METRICS_ENABLED=TRUE \
-p 8812:8812 -p 9000:9000 -p 9003:9003 -p 9009:9009 \
-v "$(pwd):/var/lib/questdb" \
questdb/questdb:7.4.0

To verify that metrics are being exposed correctly by QuestDB, navigate to http://<questdb_ip>:9003/metrics in a browser, where <questdb_ip> is the IP address of an instance, or execute a basic curl like the following example:

Given QuestDB running at 127.0.0.1
curl http://127.0.0.1:9003/metrics
# TYPE questdb_json_queries_total counter
questdb_json_queries_total 0
# TYPE questdb_memory_tag_MMAP_DEFAULT gauge
questdb_memory_tag_MMAP_DEFAULT 77872
# TYPE questdb_memory_malloc_count gauge
questdb_memory_malloc_count 659
# ...

To configure Prometheus to scrape these metrics, provide the QuestDB instance IP and port 9003 as a target. The following example configuration file questdb.yml assumes there is a running QuestDB instance on localhost (127.0.0.1) with port 9003 available:

questdb.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'questdb'
scrape_configs:
- job_name: 'questdb'
scrape_interval: 5s
static_configs:
- targets: ['127.0.0.1:9003']

Start Prometheus and pass this configuration on launch:

prometheus --config.file=questdb.yml

Prometheus should be available on 0.0.0.0:9090 and navigating to http://0.0.0.0:9090/targets should show that QuestDB is being scraped successfully:

Prometheus targets tab showing a QuestDB instance status

In the graphing tab of Prometheus (http://0.0.0.0:9090/graph), autocomplete can be used to graph QuestDB-specific metrics which are all prefixed with questdb_:

Prometheus graphing tab showing QuestDB instance metrics on a chart

The following metrics are available:

MetricTypeDescription
questdb_commits_totalcounterNumber of total commits of all types (in-order and out-of-order) executed on the database tables.
questdb_o3_commits_totalcounterNumber of total out-of-order (O3) commits executed on the database tables.
questdb_committed_rows_totalcounterNumber of total rows committed to the database tables.
questdb_physically_written_rows_totalcounterNumber of total rows physically written to disk. Greater than committed_rows with [out-of-order ingestion. Write amplification is questdb_physically_written_rows_total / questdb_committed_rows_total.
questdb_rollbacks_totalcounterNumber of total rollbacks executed on the database tables.
questdb_json_queries_totalcounterNumber of total REST API queries, including retries.
questdb_json_queries_completedcounterNumber of successfully executed REST API queries.
questdb_unhandled_errors_totalcounterNumber of total unhandled errors occurred in the database. Such errors usually mean a critical service degradation in one of the database subsystems.
questdb_jvm_major_gc_countcounterNumber of times major JVM garbage collection was triggered.
questdb_jvm_major_gc_timecounterTotal time spent on major JVM garbage collection in milliseconds.
questdb_jvm_minor_gc_countcounterNumber of times minor JVM garbage collection pause was triggered.
questdb_jvm_minor_gc_timecounterTotal time spent on minor JVM garbage collection pauses in milliseconds.
questdb_jvm_unknown_gc_countcounterNumber of times JVM garbage collection of unknown type was triggered. Non-zero values of this metric may be observed only on some, non-mainstream JVM implementations.
questdb_jvm_unknown_gc_timecounterTotal time spent on JVM garbage collection of unknown type in milliseconds. Non-zero values of this metric may be observed only on some, non-mainstream JVM implementations.
questdb_memory_tag_MMAP_DEFAULTgaugeAmount of memory allocated for mmaped files.
questdb_memory_tag_NATIVE_DEFAULTgaugeAmount of allocated untagged native memory.
questdb_memory_tag_MMAP_O3gaugeAmount of memory allocated for O3 mmapped files.
questdb_memory_tag_NATIVE_O3gaugeAmount of memory allocated for O3.
questdb_memory_tag_NATIVE_RECORD_CHAINgaugeAmount of memory allocated for SQL record chains.
questdb_memory_tag_MMAP_TABLE_WRITERgaugeAmount of memory allocated for table writer mmapped files.
questdb_memory_tag_NATIVE_TREE_CHAINgaugeAmount of memory allocated for SQL tree chains.
questdb_memory_tag_MMAP_TABLE_READERgaugeAmount of memory allocated for table reader mmapped files.
questdb_memory_tag_NATIVE_COMPACT_MAPgaugeAmount of memory allocated for SQL compact maps.
questdb_memory_tag_NATIVE_FAST_MAPgaugeAmount of memory allocated for SQL fast maps.
questdb_memory_tag_NATIVE_LONG_LISTgaugeAmount of memory allocated for long lists.
questdb_memory_tag_NATIVE_HTTP_CONNgaugeAmount of memory allocated for HTTP connections.
questdb_memory_tag_NATIVE_PGW_CONNgaugeAmount of memory allocated for PostgreSQL Wire Protocol connections.
questdb_memory_tag_MMAP_INDEX_READERgaugeAmount of memory allocated for index reader mmapped files.
questdb_memory_tag_MMAP_INDEX_WRITERgaugeAmount of memory allocated for index writer mmapped files.
questdb_memory_tag_MMAP_INDEX_SLIDERgaugeAmount of memory allocated for indexed column view mmapped files.
questdb_memory_tag_NATIVE_REPLgaugeAmount of memory mapped for replication tasks.
questdb_memory_free_countgaugeNumber of times native memory was freed.
questdb_memory_mem_usedgaugeCurrent amount of allocated native memory.
questdb_memory_malloc_countgaugeNumber of times native memory was allocated.
questdb_memory_realloc_countgaugeNumber of times native memory was reallocated.
questdb_memory_rssgaugeResident Set Size (Linux/Unix) / Working Set Size (Windows).
questdb_memory_jvm_freegaugeCurrent amount of free Java memory heap in bytes.
questdb_memory_jvm_totalgaugeCurrent size of Java memory heap in bytes.
questdb_memory_jvm_maxgaugeMaximum amount of Java heap memory that can be allocated in bytes.
questdb_http_connectionsgaugeNumber of currently active HTTP connections.
questdb_json_queries_cachedgaugeNumber of current cached REST API queries.
questdb_line_tcp_connectionsgaugeNumber of currently active InfluxDB Line Protocol TCP connections.
questdb_pg_wire_connectionsgaugeNumber of currently active PostgreSQL Wire Protocol connections.
questdb_pg_wire_select_queries_cachedgaugeNumber of current cached PostgreSQL Wire Protocol SELECT queries.
questdb_pg_wire_update_queries_cachedgaugeNumber of current cached PostgreSQL Wire Protocol UPDATE queries.

All of the above metrics are volatile, i.e. they're collected since the current database start.

Configuring Prometheus Alertmanager#

note

Full details on logging configurations can be found within the Logging & Metrics documentation.

QuestDB includes a log writer that sends any message logged at critical level (by default) to Prometheus Alertmanager over a TCP/IP socket connection. To configure this writer, add it to the writers config alongside other log writers.

Alertmanager may be started via Docker with the following command:

docker run -p 127.0.0.1:9093:9093 --name alertmanager quay.io/prometheus/alertmanager

To discover the IP address of this container, run the following command which specifies alertmanager as the container name:

docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' alertmanager

To run QuestDB and point it towards Alertmanager for alerting, first create a file ./conf/log.conf with the following contents. 172.17.0.2 in this case is the IP address of the docker container for alertmanager that was discovered by running the docker inspect command above.

./conf/log.conf
# Which writers to enable
writers=stdout,alert
# stdout
w.stdout.class=io.questdb.log.LogConsoleWriter
w.stdout.level=INFO
# Prometheus Alerting
w.alert.class=io.questdb.log.LogAlertSocketWriter
w.alert.level=CRITICAL
w.alert.alertTargets=172.17.0.2:9093

Start up QuestDB in Docker using the following command:

docker run \
-p 9000:9000 -p 8812:8812 -p 9009:9009 -p 9003:9003 \
-v "$(pwd)::/var/lib/questdb" \
questdb/questdb:6.1.3

When alerts are successfully triggered, QuestDB logs will indicate the sent and received status:

2021-12-14T18:42:54.222967Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.122874Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.073978Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}
2021-12-14T18:42:54.223377Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}

โญ Something missing? Page not helpful? Please suggest an edit on GitHub.