High Availability and Disaster Recovery deployment options

Why activate HA/DR

Use High Availability (HA) or Disaster Recovery (DR) to resume Axway Decision Insight (DI) activity as fast as possible following the failure of your DI main node. A DI failure may be typically caused by a hard disk or network failure.

In case of software failure, please contact support as soon as possible as the same problem may occur on the DI backup node.

Who is eligible

In order to transfer data consumption to another DI node, the data integration has to:

  • Use States in routes to store their progress.
  • Be runnable from both main or backup node.

To meet these requirements, you can use the Integration cluster pattern deployment.

Supported deployments

In this setup, a single node of the application is active,  the HA main node. A HA backup node, connected to the main node for data replication, is ready to be switched as main in case the main node fails.

This deployment is achieved when you Install a HA cluster.

Data replication

The backup node continuously downloads data from the main node in order to be a clone of the main node. However, active features are deactivated (computing, data integration, ...).

Failover

A backup will be able to switch as main node with less than 2 minutes of data reprocess.

When implementing a DR strategy, it is up to you to:

  • Route end-users to the active node (using a load balancer).
  • Decide when to switch to the backup node (following the detailed procedure: Switch backup to main).

With SAN storage

In this setup, deployment is typically performed using virtualized servers using a SAN (Storage Area Network) as storage subsystem, and sometimes when using physical servers. Connection to the SAN is performed using specific networking mechanisms such as:

  • Fiber Channel
  • iSCSI

When deploying over a SAN, it should be noted that Decision Insight has the same Storage (I/O and surface) requirements as a database management system, and should as such be positioned on Tier 0 or 1 of the SAN for performance reasons.

Disk replication

When using a SAN, disk replication is typically handled by the SAN itself within a single datacenter, covering as such the disk replication requirements for HA.

If multi-site SAN replication is in place, it also covers the disk replication requirements for DR. If not available and DR is required, a specific replication can be setup backing up the configuration and database of the active node with the passive node on a regular basis using a cron for example.

Failover

A backup be able to started from last checkpoint (it will have to reprocess up to 30 minutes of data).

With a virtualized deployment supporting HA

When the virtualization system HA mechanisms are available and deployed (for example, VMware vMotion, VMware HA, VMware FT, etc.), the HA failover typically relies on them for all hardware failures.

Depending on the capabilities of the virtualization system to perform failover over multiple sites, DR can also rely on those mechanisms or require a specific procedure (See With a physical deployment).


With a virtualized deployment without HA

The failover mechanism is similar to With a physical deployment.

Alternatively, a manual or automated procedure can consist in launching the virtual machine on another node instead of launching the application from within a passive virtual machine.


With a physical deployment

In order to perform failover between nodes, the deployment relies on external mechanisms, typically a clustered service manager which ensures that a given service is properly running once and only once, and control which instance should be running at a given moment.

The same principle applies to failover to a DR site. It is not unusual to have an automatic failover mechanism for HA, but to rely on manual procedure for DR.


Related Links