Database replication

QuestDB Enterprise supports high availability through primary-replica replication and point-in-time recovery.

This document will walk you through setup for database replication.

For a full list of configuration and tuning options, see the Configuration section.

If the cluster is already running, enabling replication requires minimal steps:

  1. Create and configure object storage for Write Ahead Log (WAL) files in AWS or Azure
  2. Enable a primary node and upload WAL files to the object storage
  3. Take a data Snapshot of the primary node
  4. Configure a replica node or and restore via snapshot or allow sync via WAL files

If the cluster is new and not already running:

  1. Create and configure object storage for Write Ahead Log (WAL) files in AWS or Azure
  2. Enable a primary node and upload WAL files to the object storage
  3. Enable one or more replica nodes

Setup object storage#

Choose where you intend to store replication data:

  1. Azure blob storage
  2. Amazon S3

Our goal is to build a string value for the replication.object.store key within server.conf.

Azure Blob Storage#

Setup storage in Azure and retrieve values for:

  • STORE_ACCOUNT
  • BLOB_CONTAINER
  • STORE_KEY

First, follow Azure documentation to create a Storage Account.

There are some important considerations.

For appropriate balance, be sure to:

  • Select a geographical location close to the primary QuestDB node to reduce the network latency
  • Choose optimal redundancy and performance options according to Microsoft
  • Disable blob versioning

Keep your STORE_ACCOUNT value.

Next, set up Lifecycle Management for the blobs produced by replication. There are considerations to ensure cost-effective WAL file storage. For further information, see the object store expiration policy section.

After that, create a Blob Container to be the root of your replicated data blobs.

It will will soon be referenced in the BLOB_CONTAINER variable.

Finally, save the Account Key. It will be used to configure the QuestDB primary node as STORE_KEY.

In total, from Azure you will have retrieved:

  • STORE_ACCOUNT
  • BLOB_CONTAINER
  • STORE_KEY

The value provided to replicate.object.store is thus:

azblob::endpoint=https://${STORE_ACCOUNT}.blob.core.windows.net;container={BLOB_CONTAINER};root=${DB_INSTANCE_NAME};account_name=${STORE_ACCOUNT};account_key=${STORE_KEY}

The value of DB_INSTANCE_NAME can be any unique alphanumeric string, which includes dashes -.

Be sure to use the same name across all the primary and replica nodes within the replication cluster.

With your values, skip to the Setup database replication section.

Amazon AWS S3#

Our goal is to setup AWS S3 storage and retrieve:

  • BUCKET_NAME
  • AWS_REGION
  • AWS_ACCESS_KEY (Optional)
  • AWS_SECRET_ACCESS_KEY (Optional)

First, create an S3 bucket as described in AWS documentation. The name is our BUCKET_NAME & AWS_REGION. Prepare your AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY if needed, depending on how you manage AWS credentials.

There are some important considerations.

For appropriate balance, be sure to:

  • Select a geographical location close to the primary QuestDB node to reduce the network latency
  • Choose optimal redundancy and performance options according to Amazon
  • Disable blob versioning

Finally, set up bucket lifecycle configuration policy to clean up WAL files after a period of time. There are considerations to ensure that the storage of the WAL files remains cost-effective. For deeper background, see the object storage expiration policy section.

We have now prepared the following:

  • BUCKET_NAME
  • AWS_REGION
  • AWS_ACCESS_KEY (Optional)
  • AWS_SECRET_ACCESS_KEY (Optional)

And created a value for replication.object.store:

s3::bucket=${BUCKET_NAME};root=${DB_INSTANCE_NAME};region=${AWS_REGION};access_key_id=${AWS_ACCESS_KEY};secret_access_key=${AWS_SECRET_ACCESS_KEY}

The value of DB_INSTANCE_NAME can be any unique alphanumeric string, which includes dashes -.

Be sure to use the same name across all the primary and replica nodes within the replication cluster.

With your values, continue to the Setup database replication section.

Setup database replication#

Set the following changes in their respective server.conf files:

  1. Enable a primary node to upload to object storage

  2. Set replica(s) to download from object storage

Set up a primary node#

SettingDescription
replication.roleSet to primary .
replication.object.storeCreated based on provider specifications. The result of the above setup object storage sections.
cairo.snapshot.instance.idUnique UUID of the primary node

After the node is configured for replication, restart QuestDB.

At this point, create a database snapshot.

Frequent snapshots can alter the effectiveness of your replication strategy.

To help you determine the right snapshot, see the snapshot schedule section.

Now that a primary is configured, next setup a replica - or two, or three - or more!

Set up replica node(s)#

Create a new QuestDB instance.

Set server.conf properties:

SettingValue
replication.roleSet to replica.
replication.object.storeThe same string used in the primary node
cairo.snapshot.instance.idUnique UUID of the replica node
note

Please do not copy server.conf files from the primary node when creating the replica. Setting the same replication.object.store stream on 2 nodes and enabling 2 nodes to act as primary will break the replication setup.

After the blank replica database is created, restore the db directory folder from the snapshot taken from the primary node. Then start the replica node. The replica will download changes and will catch up with the primary node.

This concludes a walkthrough of basic replication.

For full configuration details, see the next section.

To learn more about the roadmap, architecture and topology types, see the Replication concept page.

Configuration#

The following presents all available configuration and tuning options.

All replication configuration is kept in the same server.conf file as all other database settings.

note

These settings can be sensitive - especially within Azure.

Consider using environment variables.

For example, to specify the object store setting from an environment variable specify:

export QDB_REPLICATION_OBJECT_STORE="azblob::DefaultEndPointsProtocol..."

Once settings are changed, stop and restart the database.

Note that replication is performed by the database process itself.

There is no need to start external agents, register cron jobs or similar.

Core replication settings#

SettingDescription
replication.roleDefaults to none for stand-alone instances. To enable replication set to one of: primary, replica
replication.object.storeA configuration string that allows connecting to an object store. The format is scheme::key1=value;key2=value2;…. The various keys and values are detailed in a later section. Ignored if replication is disabled.

Tuning settings#

SettingDescription
replication.requests.max.concurrentA limit to the number of concurrent object store requests. The default is 0 for unlimited.
replication.requests.retry.attemptsMaximum number of times to retry a failed object store request before logging an error and reattempting later after a delay.
replication.requests.retry.intervalSeconds delay between replication.requests.retry.attempts
replication.primary.compression.threadsMax number of threads used to perform file compression operations before uploading to the object store. The default value is calculated as half the number of CPU cores.
replication.primary.compression.levelZstd compression level. Defaults to 1. Valid values are from 1 to 22.
replication.replica.poll.secPolling rate of a replica instance to check for the availability of new changes.

Additional settings#

Replication is implemented using a set of worker threads for IO operations.

These settings alter resource usage.

The defaults should be appropriate in most cases.

SettingDefault ValueDescription
native.async.io.threads=NcpuCountThe number of async (network) io threads used for replication (and in the future cold storage). The default should be appropriate for most use cases.
native.max.blocking.threads=NcpuCount * 4Maximum number of threads for parallel blocking disk IO read/write operations for replication (and other). These threads are ephemeral: They are spawned per need and shut down after a short duration if no longer in use. These are not cpu-bound threads, hence the relative large number. The default should be appropriate for most use cases.

Snapshot schedule and object store expiration policy#

Replication files are typically read by replica nodes shortly after upload from the primary node. After initial access, these files are rarely used unless a new replica node starts. To optimize costs, we suggest moving files to cooler storage tiers using cloud expiration policies after 1-7 days. These tiers are more cost-effective for long-term storage of infrequently accessed files.

We recommend:

  1. Set up periodic primary node snapshots on a 1-7 day interval
  2. Keep Write Ahead Log (WAL) files in the object store for at least 30 days

Taking snapshots every 7 days and storing WAL files for 30 days allows database restoration within 23 days. Extending WAL storage to 60 days increases this to 53 days.

Ensure snapshot intervals are shorter than WAL expiration for successful data restoration. Shorter intervals also speed up database rebuilding after a failure. For instance, weekly snapshots take 7 times longer to restore than daily ones due to the computational and IO demands of applying WAL files.

For systems with high daily data injection, daily snapshots are recommended. Infrequent snapshots or long snapshot periods, such as 60 days with 30-day WAL expiration, may prevent successful database restoration.


Something missing? Page not helpful? Please suggest an edit on GitHub.