ALTER TABLE RESUME WAL
Restarts transactions of a WAL table after recovery from errors.
Syntax
Description
sequencerTxn
is the unique txn
identification that the Sequencer issues to
transactions.
When sequencerTxn
is not specified, the operation resumes the WAL apply job
from the next uncommitted transaction, including the failed one.
When sequencerTxn
is not specified, the operation resumes the WAL apply job
from the provided sequencerTxn
number explicitly.
ALTER TABLE RESUME WAL
is used to restart WAL table transactions after
resolving errors. When transactions are stopped, the suspended
status from the
wal_tables()
function is marked
as true
, and the sequencerTxn
value indicates the last successful commit in
the Sequencer. Once the error is resolved, ALTER TABLE RESUME WAL
restarts the
suspended WAL transactions from the failed transaction. Alternatively, an
optional sequencerTxn
value can be provided to skip the failed transaction.
Examples
Using the wal_tables()
function
to investigate the table status:
wal_tables();
name | suspended | writerTxn | sequencerTxn |
---|---|---|---|
trades | true | 3 | 5 |
The table trades
is suspended. The last successful commit in the table is
3
.
The following query restarts transactions from the failed transaction, 4
:
ALTER TABLE trades RESUME WAL;
Alternatively, specifying the sequencerTxn
to skip the failed commit (4
in
this case):
ALTER TABLE trades RESUME WAL FROM TRANSACTION 5;
-- This is equivalent to
ALTER TABLE trades RESUME WAL FROM TXN 5;
Diagnosing corrupted WAL transactions
If you have data deduplication enabled on your tables and you have access to the original events (for instance, they're stored in Apache Kafka, or other replayable source), you may reingest the data after skipping the problematic transactions.
Sometimes a table may get suspended due to full disk or kernel limits. In this case, an entire WAL segment may be corrupted. This means that there will be multiple transactions that rely on the corrupted segment, and finding the transaction number to resume from may be difficult.
When you run RESUME WAL on such suspended table, you may see an error like this:
2024-07-10T01:01:01.131720Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=trades~3, error=could not open read-only [file=/home/my_user/.questdb/db/trades~3/wal45/101/_event], errno=2]
In such a case, you should try skipping all transactions that rely on the corrupted WAL segment. To do that, first you need to find the last applied transaction number for the trades
table:
SELECT writerTxn
FROM wal_tables()
WHERE name = 'trades';
writerTxn |
---|
1223 |
Next, query the problematic transaction number:
SELECT max(sequencertxn)
FROM wal_transactions('trades')
WHERE sequencertxn > 1223
AND walId = 45
AND segmentId = 101;
Here, 1223
stands for the last applied transaction number, 45
stands for the WAL ID that may be seen in the error log above (trades~3/wal45
), and 101
stands for the WAL segment ID from the log (trades~3/wal45/101
).
max |
---|
1242 |
Since the last problematic transaction is 1242
, you can resume the table from transaction 1243
:
ALTER TABLE trades RESUME WAL FROM TXN 1243;
Note that in rare cases, subsequent transactions may also have corrupted WAL segments, so you may have to repeat this process.