Determine if a node is up & running

This page provides guidelines for system administrators to determine whether your node is working as expected.

Ensure that the process is up & running

As a service on Linux

The service is launched and controlled as follows:

  • The service manager first launches the jsvc process with pid 0 as a parent process, and which runs as root. This is the controller process.
  • The controller process launches a child process, also named jsvc, and running as the application user. This is the controlled process and the actual Decision Insight node.

The controller process:

  • Starts and stops the controlled process based on requests from the service manager.
  • Acts as a watchdog; it relaunches the controlled process if it crashes.

For more information, see Manage a node under Linux or to the public documentation about jsvc.

As a service on Windows

The service is managed by Windows Service Manager, which directly launches the controlled process:

  • The service manager launches a child process, named procrun, and which runs as the application user.

The Windows Service Manager starts and stops the controlled process. However, if the controlled process crashes, it is not restarted and the service is marked as Stopped.

In order to monitor the status of the windows service, you can either setup the "Recovery Actions" for the service or use an external system monitoring tool.

For more information, see Manage a node under Windows or the public documentation about procrun.

As a standalone process

If the node is not installed as a service and is managed by a third-party tool,  monitor its presence with a tool of your choice. For more information, see: How to identify the node process ID?

Ensure that the node is responding and operating properly

Using an HTTP request

To ensure that the node is still responding to requests and available to end users, you can configure a heartbeat using an HTTP request from the system monitoring tool of your choice.

The typical URL used for this heartbeat is <base URL>/heartbeat/node.

To know if a node is in a STARTED state, use <base URL>/heartbeat/node/readiness.

Using a Data Integration route

To ensure that the node is operational, you can configure a heartbeat from the node toward an external system monitoring tool.

The aim is for the heartbeat to get triggered by an activity at the application level, for example, the computing of an indicator.

To configure the heartbeat:

  1. Configure a data integration event, based on a rhythmic indicator from the Global entity of the application.
  2. Configure a data integration route, triggered by the event, that performs a heartbeat call to the monitoring system (using a communication mechanism depending on the target tool).
    This route is triggered for each value computed for the indicator, and should, therefore, be expected at the rhythm of the selected indicator (e.g. a 1-minute indicator should send an event about every minute)

If the heartbeat occurrences derive (e.g. gets later and later compared to every minute), it means that the deployment is overloaded and is late with its analysis.

If the heartbeat does not occur anymore, it means that the deployment is either very slow or stuck.

To set up such a mechanism, follow Create an attribute computing heartbeat.

Related Links