Switch backup to main

This page explains how to switch the backup node and the main node in your HA cluster. 

At HA node start, or if the main node fails, an operator (the cluster manager) is in charge of switching backup and main roles.

A REST API is provided to control HA status. It is secured by an access token in the header (specified by the property com.systar.electron.ha.token).

Resource Verb Description Parameters

/rest/ha/status

GET Get HA status

-


POST Change HA status change

New status.

Allowed values :

  • MAIN
  • BACKUP

When HA node switches from backup to main, it resumes last main's work with less that 2 minutes of data reprocess.

When HA node switches from main to backup, the node is killed (return code = 0).
The cluster manager needs to restart the node after this operation. Alternatively, the DI nodes can be installed as a service with automatic restart configuration.


When switching

  1. At main node start-up: a HA node is always a backup to avoid split brain situation.
  2. If main node fails

The following table lists potential causes of main node failure. 

Failure Detection Remediation
Hard disk failure Hardware monitoring Switch to backup.
Network failure Hardware monitoring If backup and load balancer are on another network layer, switch to backup.
(warning) Make sure the old main node is shut down.

Software failure
(not due to DI)

Process monitoring Switch to backup.
Software failure
(due to DI)
Computing heart beat Switch to backup.
Manual / Maintenance
Switch to backup if you need to perform operations on the main node computer.

Steps to start HA cluster

1. Start main node

All HA nodes start as backup to avoid split brain problems.

2. Switch main node

Turn your node into the main node with the cluster manager REST API.

# HA Status only responds when node is started
# Retry until it returns "BACKUP"
curl -H "Authorization: token myAccessToken" -X GET http://node2host:8080/rest/ha/status
BACKUP

 
# Change status
curl -H "Authorization: token myAccessToken" -X POST http://node2host:8080/rest/ha/status?change=MAIN

# Read status again
curl -H "Authorization: token myAccessToken" -X GET http://node2host:8080/rest/ha/status
MAIN

3. Start consumer nodes

Backup and Replicas can now connect to the main.

Steps to switch after failure

1. Passivate failing main

If the failing main node is still reachable, turn it into a backup node using the cluster manager REST API.

# Read status
curl -H "Authorization: token myAccessToken" -X GET http://node1host:8080/rest/ha/status
MAIN

 
# Change status
curl -H "Authorization: token myAccessToken" -X POST http://node1host:8080/rest/ha/status?change=BACKUP

# Read status again after node restart
curl -H "Authorization: token myAccessToken" -X GET http://node1host:8080/rest/ha/status
BACKUP

If not, ensure the main node is shut down.

It is important not to have 2 active main nodes at the same time: they would randomly consume data integration and lead to database corruption (split brain)

2. Turn backup into main

Turn your backup node into the main node with the cluster manager REST API.

# Read status
curl -H "Authorization: token myAccessToken" -X GET http://node2host:8080/rest/ha/status
BACKUP

 
# Change status
curl -H "Authorization: token myAccessToken" -X POST http://node2host:8080/rest/ha/status?change=MAIN

# Read status again
curl -H "Authorization: token myAccessToken" -X GET http://node2host:8080/rest/ha/status
MAIN

3. Redirect replicas to new main

The replica nodes should now synchronize with the new main node. For each replica node in the cluster, this property should be updated for each replica node in the cluster:

com.systar.electron.host

Host / IP of the new main node, acting as primary

4. Restart consumer nodes

All replicas (and optionally new backup) must be restarted in order to take new configuration into account. Automatic after exit if nodes are configured to restart automatically.

Related Links