High availability clustering with keepalived

The API Gateway Appliance uses the keepalived userspace daemon to provide health checks and failover for cluster nodes in a server pool. This implements the Virtual Router Redundancy Protocol (VRRPv2) to handle failover, and provides a virtual IP address for the server pool. The keepalived daemon ensures that the API Gateway is reachable on a specified IP address, even if one of the servers in a cluster (or API Gateway process on one of the servers) fails.

You can use keepalived to configure multiple servers in a cluster, but only one of the servers is active and listens on the virtual IP address at any given time. There is no load balancing among the servers in a cluster.

You can use the Keepalived page in the Web Administration Interface (WAI) to configure a cluster and start up keepalived. You can view the status of the keepalived process (whether it is running), and key information about the current keepalived configuration. You can start, stop, and reload the keepalived process, and view any log messages related to the process. You can also edit the configuration file and load a stored master or backup configuration on the server.

keepalived configuration page

Start keepalived on system bootup

The keepalived service is disabled by default on the appliance. To start the service automatically on system bootup, you must change the default in the WAI Bootup and Shutdown page. Select the check box next to keepalived, and click the Start On Boot button.

Start keepalived on boot

Alternatively, you can enable the service on the command line:

  1. Log in to the appliance using the default administrator account (user name admin ) and use su - to switch to the root user after logging in. You can log in locally or using SSH. For more information, see Connect to consoles and user interfaces.
  2. Run the following command:
# chkconfig keepalived on

By default, keepalived performs a healthcheck on the API Gateway every 120 seconds. To change this to a lower value, edit the interval value in the chk_vshell section of the configuration file.

Configure the firewall for keepalived

For keepalived to work, you need to allow access through the firewall for packets with a destination of 224.0.0.18 and protocol of 112 (for VRRP). This is configured on the appliance by default.

For more details, see Configure the Linux firewall.

Debug keepalived

To debug keepalived, check your /var/log/messages directory for any errors. Common problems arise from incorrect or non-matching entries in the configuration files. Check the values of the following settings in the configuration files:

  • virtual_router_id
  • virtual_ipaddress
  • auth_pass
  • priority

You should also check that it is possible to reach the Healthcheck URL configured on the keepalived Status table. For example, you can log in to the appliance directly, and run the curl command against this URL.

To check the keepalived traffic reaching the system, run the following tcpdump command (when logged in as root on the appliance):

# tcpdump -envi ethGb1 host 224.0.0.18

This should show you packets between different hosts in the cluster. If there is no traffic coming through, check the firewall on any systems in the cluster and also check the status of the service.

Related Links