Health Check Service

Configuration

The Health Check service configuration is stored in the domain.xml. An example can be seen below:

<health-check-service-configuration enabled="true">
    <garbage-collector-checker unit="SECONDS" name="GC" time="5" enabled="true"></garbage-collector-checker>
    <cpu-usage-checker unit="SECONDS" name="CPU" time="5" enabled="true">
        <property name="threshold-critical" value="90"></property>
        <property name="threshold-warning" value="70"></property>
        <property name="threshold-good" value="0"></property>
    </cpu-usage-checker>
    <machine-memory-usage-checker enabled="true" unit="SECONDS" name="MMEM" time="5">
        <property name="threshold-critical" value="95"></property>
        <property name="threshold-warning" value="60"></property>
        <property name="threshold-good" value="20"></property>
    </machine-memory-usage-checker>
    <heap-memory-usage-checker unit="SECONDS" name="HEAP" time="5" enabled="true">
        <property name="threshold-critical" value="92"></property>
        <property name="threshold-warning" value="75"></property>
        <property name="threshold-good" value="15"></property>
    </heap-memory-usage-checker>
</health-check-service-configuration>

The main configuration tag is the <health-check-service-configuration> which can be found directly under the parent config tag, <config name="server-config"> for example. It has only one attribute named enabled, which can be set to either true or false to turn the entire Healthcheck service on or off.

The List of Available Checkers

<health-check-service-configuration> can contain a variety of checkers, each of which do specific monitoring on the listed metrics:

Attribute Description
<cpu-usage-checker> Calculates the CPU usage and prints out the percentage along with the usage time.
<garbage-collector-checker> Calculates and prints out how many times GC is executed with its elapsed time.
<machine-memory-usage-checker> Calculates the machine memory usage and prints out the percentage along with the total and used physical memory size.
<heap-memory-usage-checker> Calculates the heap memory usage and prints out the percentage along with initial and committed heap sizes.
<hogging-threads-checker> Identifies the threads that are hogging the CPU.
<connection-pool-checker> Calculates the ratio of free/used connections available for all JDBC connections pool an prints the percentage of used connections for each active pool.

They all have the following base attributes, that need to be specified:

Attribute Description
enabled Enables or disables the specified checker.
name Name of the checker that will be printed in the log messages for tracing.
unit The time unit value, which could either be: NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, HOURS, DAYS.
time The time interval value (as an integer) specified in given unit to execute the checker for the metric.

Threshold configurations

Just like with the healthcheck-configure-service-threshold asadmin command, there are threshold configurations for the following checkers:

  • <cpu-usage-checker>
  • <machine-memory-usage-checker>
  • <heap-memory-usage-checker>
  • <connection-pool-checker>

The threshold configurations are specified for 3 different levels: CRITICAL, WARNING and GOOD. By default their values are 80, 50 and 0 respectively. A sample configuration for the cpu-usage-checker is given as follows:

<cpu-usage-checker enabled="true" unit="SECONDS" name="CPU" time="3">
    <property name="threshold-critical" value="90"></property>
    <property name="threshold-warning" value="70"></property>
    <property name="threshold-good" value="0"></property>
</cpu-usage-checker>

Keep in mind that all threshold values must be provided, otherwise the configuration will not work appropriately and will cause a startup error.

Checkers with Customized Configuration

<garbage-collector-checker>

There are no configurable options to set for the Garbage Collection checker. The checker is either on or off.

<hogging-threads-checker>

The Hogging Threads checker offers the following 2 properties for configuration:

  • threshold-percentage: Defines the minimum percentage needed to count the thread is hogged CPU-wise. The percentage is calculated with the ratio of elapsed CPU time to checker execution interval. It default value is 95.
  • retry-count: Represents the count value that should be reached by the hogged thread in order to give health check messages to the user. Its default value is 3.

A sample of this configuration could be:

<health-check-service-configuration enabled="true">
      <hogging-threads-checker unit="MINUTES" time="1" enabled="true" threshold-percentage="65" retry-count="10"></hogging-threads-checker>
</health-check-service-configuration>

results matching ""

    No results matching ""