Sizing prerequisites


To ensure message resiliency and prevent data loss, Decision Insight Messaging System requires at least three nodes, each deployed on a different server.


To properly size your deployment, consider the following question:


  • How many nodes of the cluster can go offline (or fail) before losing the service?

Your answer will define the fault tolerance coefficient that is used to define the size of your cluster, the default replication factor, the minimal synchronized messaging servers and the number of partitions.

Size of your cluster

  • Define the minimal number of orchestrators / messaging servers that should be part of your cluster.
    You can define it using the following formula: fault tolerance coefficient x 2 + 1
     
  • For a fault tolerance coefficient of 1, you'll have: 1 x 2 + 1 = 3 (at least 3 servers in your cluster).

Default replication factor

  • Define the number of time a piece of data is replicated in the cluster.
    You can define it using the following formula: fault tolerance coefficient x 2 +  1
     
  • For a fault tolerance coefficient of 1, you'll have : 1 x 2 + 1 = 3
    This indicates that any message should be present 3 times in the cluster when everything is fine (all nodes are up).

Minimal number of synchronized messaging servers

  • Define the minimal number of times a message should be duplicated over the cluster before the message is considered safe.
    You can define this number using the following formula: fault tolerance coefficient + 1
     
  • For a fault tolerance coefficient of 1, you'll have: 1 + 1 = 2
    This indicates that when you write a new message, it should be copied on at least two servers before being accepted.

Number of partitions

  • Define the number of physical partitions per topic used to store messages.
    This value impacts the degree of parallelism inside a consumer group over a single topic (no more consumers can process messages than the number of available partitions).

  • You can define the minimal value using the following formula: fault tolerance coefficient x 2 + 1.
    For instance, a fault tolerance coefficient of 1 will require 3 = 1 x 2 + 1 partitions: your messages will be split over three partitions so no more than three consumers can process them.

  • A rough formula for picking the number of partitions is based on throughput (message per second for instance):
    Let’s say your target throughput is t.
    You measure the throughout that you can achieve on a single partition for production: call it 
    p.
    You measure the throughout that you can achieve on a single partition for consumption: call it 
    c.  
    Then, you need to have at least 
    max(t/p, t/c) partitions.

Related Links