ALTER TABLE RESUME WAL

Restarts transactions of a WAL table after recovery from errors.

Syntax

Flow chart showing the syntax of the ALTER TABLE keyword Flow chart showing the syntax of ALTER TABLE with RESUME WAL keyword

Description

sequencerTxn is the unique txn identification that the Sequencer issues to transactions.

When sequencerTxn is not specified, the operation resumes the WAL apply job from the next uncommitted transaction, including the failed one.

When sequencerTxn is not specified, the operation resumes the WAL apply job from the provided sequencerTxn number explicitly.

ALTER TABLE RESUME WAL is used to restart WAL table transactions after resolving errors. When transactions are stopped, the suspended status from the wal_tables() function is marked as true, and the sequencerTxn value indicates the last successful commit in the Sequencer. Once the error is resolved, ALTER TABLE RESUME WAL restarts the suspended WAL transactions from the failed transaction. Alternatively, an optional sequencerTxn value can be provided to skip the failed transaction.

Examples

Using the wal_tables() function to investigate the table status:

List all tables
wal_tables();
namesuspendedwriterTxnsequencerTxn
tradestrue35

The table trades is suspended. The last successful commit in the table is 3.

The following query restarts transactions from the failed transaction, 4:

ALTER TABLE trades RESUME WAL;

Alternatively, specifying the sequencerTxn to skip the failed commit (4 in this case):

ALTER TABLE trades RESUME WAL FROM TRANSACTION 5;

-- This is equivalent to

ALTER TABLE trades RESUME WAL FROM TXN 5;

Diagnosing corrupted WAL transactions

note

If you have data deduplication enabled on your tables and you have access to the original events (for instance, they're stored in Apache Kafka, or other replayable source), you may reingest the data after skipping the problematic transactions.

Sometimes a table may get suspended due to full disk or kernel limits. In this case, an entire WAL segment may be corrupted. This means that there will be multiple transactions that rely on the corrupted segment, and finding the transaction number to resume from may be difficult.

When you run RESUME WAL on such suspended table, you may see an error like this:

2024-07-10T01:01:01.131720Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=trades~3, error=could not open read-only [file=/home/my_user/.questdb/db/trades~3/wal45/101/_event], errno=2]

In such a case, you should try skipping all transactions that rely on the corrupted WAL segment. To do that, first you need to find the last applied transaction number for the trades table:

SELECT writerTxn
FROM wal_tables()
WHERE name = 'trades';
writerTxn
1223

Next, query the problematic transaction number:

SELECT max(sequencertxn)
FROM wal_transactions('trades')
WHERE sequencertxn > 1223
AND walId = 45
AND segmentId = 101;

Here, 1223 stands for the last applied transaction number, 45 stands for the WAL ID that may be seen in the error log above (trades~3/wal45), and 101 stands for the WAL segment ID from the log (trades~3/wal45/101).

max
1242

Since the last problematic transaction is 1242, you can resume the table from transaction 1243:

ALTER TABLE trades RESUME WAL FROM TXN 1243;

Note that in rare cases, subsequent transactions may also have corrupted WAL segments, so you may have to repeat this process.