How to rollback a node state to a specific checkpoint?

Restoring a node state to a previous checkpoint makes the node forget all the data stored in the database that is dated after the creation date of the checkpoint. This includes:

  • All collected data (data stored in the database by data integration routes)
  • All changes made on dashboards
  • All changes made on applications, models, ...

This procedure is recommended when, for example, you attempted to upgrade Decision Insight and the upgrade attempt was unsuccessful. Reverting the node enables you to get back to the most recent stable version of your environment.

To restore a node back to its state when a particular checkpoint was created, follow these steps:

Stop the node

Stop the node using one of the following methods:

  • Type exit 0 on the local console (if started using or tnd-start.bat).
  • Remotely log on the node shell and type exit 0.
  • Stop the service (if the node runs as a service).
  • Kill the java process (no checkpoint will be taken, but it may be ok as the aim is to restore a previous checkpoint).

Remove all newer checkpoints

When the deployment starts, it uses the most recent checkpoint and loads its content. As a consequence, all checkpoints with a transaction time greater than the desired checkpoint to restore must be deleted.

To manually delete the checkpoints:

  1. Open the <node directory>/var/data/titanium-temporal directory, and in all subdirectories of this directory:
    • Open the checkpoint sub-directory.
    • Delete all directories ttXXXXX where XXXXX is greater than the transaction time of the checkpoint to restore.
  2. Open the <node directory>/var/data/calcium/checkpoint directory (if it exists).
    • Delete the directory ttXXXXX where XXXXX is greater than the transaction time of the checkpoint to restore.

If you only know the description of the checkpoint you want to restore and you cannot start your node, you can find a checkpoint's comment in <node directory>/var/data/calcium/checkpoint/ttXXXXX/metadata under comment.

Start the node

The node will start and load the content of the most recent checkpoint (i.e. the checkpoint to restore). It may move files that are now unused to corrupted directories.

When the node is correctly started, delete the corrupted directories:

  • Open the <node directory>/var/data/titanium-temporal directory, and in all sub-directories of this directory:
    • Delete the corrupted subdirectory

Related Links