LiveCycle ES: OutOfMemoryError: SystemFailure Proctor : memory has remained chronically below 1.048.576 bytes

Issue

 The following exception appears repeatedly in the LiveCycle server log files after the server has been running for some time:

17.04.10 12:04:22:174 CEST 000000bc WorkPolicyEva W com.adobe.idp.dsc.workmanager.workload.policy.WorkPolicyEvaluationBackground
Task run Policies violated for statistics on ‘adobews__1730804381:wm_default’. 
Cause: com.adobe.idp.dsc.workmanager.workload.policy.WorkPolicyViolationException: Policy Violation. Will attempt to recover in 60000 ms

17.04.10 12:04:49:402 CEST 000002bc ExceptionUtil E CNTR0020E: EJB encountered an undefined exception calling the method “getObjectType” 
for Bean “BeanId(dms_lc_was#adobe-pof.jar#adobe_POFDataDictionaryLocalEJB, null)”. 
Exception details: java.lang.OutOfMemoryError: SystemFailure Proctor : memory has remained chronically below 1.048.576 bytes 
(out of a maximum of 1.073.741.824 ) for 15 sec.
 At com.gemstone.gemfire.SystemFailure.runProctor(SystemFailure.java:573)
 At com.gemstone.gemfire.SystemFailure$4.run(SystemFailure.java:536)
 At java.lang.Thread.run(Thread.java:811)

When these errors have been repeated in the log for some time the LiveCycle server usually stops processing new work items and goes into a kind of standby mode, waiting on the garbage collector to clean up the Java heap.  This problem has been reported so far on WebSphere only.

Explanation

LiveCycle has a specific threshold for memory usage which, when reached, results in a call to the garbage collector to clean up the heap.  LiveCycle allows a certain amount of time for the garbage collector to react and clean the heap. If this does not happen in the specified time window, then LiveCycle goes into standby mode to prevent data loss due to a memory deficit.

There are two areas of concern to the out-of-memory condition: Adobe Work Manager and the LiveCycle cache subsystem.  Both of these subsystems use a similar approach to identifying a potential or imminent memory deficit.

1. Adobe Work Manager

Adobe Work manager periodically checks the apparent available memory as a percent of the maximum allowable size of the heap. It uses this check as a metric of overall system load, which it then uses to plan the execution of background tasks. When encountering what it perceives as a memory deficit, the Work Manager generates the exception above and refrains from scheduling background tasks. Upon resolution of the memory deficit by a garbage collection event, the Work Manager resumes normal operation.

You can ignore the memory-availability checking of the Work Manager and LiveCycle will continue work as expected. You can configure Work Manager to refrain from using memory availability as a factor in its scheduling. To ignore memory checking in the Work Manager, use the following property:

-Dadobe.workmanager.memory-control.enabled=false

2. LiveCycle cache subsystem

The LiveCycle cache subsystem also periodically checks the apparent available memory in the same way as Work Manager. When the LiveCycle cache subsystem perceives a critical memory deficit, it enters an emergency shutdown mode and no longer operates properly. Consequently, LiveCycle functions that use the cache subsystem no longer operate properly. There is no means to recover other than restarting the applications.

The critical logic within the cache subsystem senses the memory using runtime.freeMemory(). The logic depends on the ability to trigger a garbage collection via System.gc() within the configured wait period defined for assessing the memory deficit. Reviewing the circumstances when System.gc() can be called, this subsystem falls within the best practices described by IBM. That is, a GC is requested only when memory is indeed reaching capacity and the JVM would, of its own accord, be scheduling a GC.

When System.gc() is disabled via -Xdisableexplicitgc, the GC trigger is ignored and the cache subsystem waits for a GC to occur due to the JVMs actions. When the system is lightly loaded or idle, the rate of memory growth is slow. The time between memory exceeding the cache subsystem’s tolerance, and when memory is exhausted sometimes exceeds the period defined for waiting for deficit resolution. This interval is sometimes detected as a critical memory deficit, triggering an emergency shutdown.

The default configuration of the cache subsystem is:

• chronic_memory_threshold — 1 MB

• MEMORY_MAX_WAIT —  15 seconds

Solution

Adobe recommends avoiding these emergency shutdown activities being triggered by enabling System.gc(), and allowing the cache subsystem to operate as designed.

Adobe recognizes that this solution isn’t feasible for all customers, and has identified the following work-around configuration.  If you cannot enable the system Garbage Collector by removing the –Xdisableexplicitgc parameter then you will need to apply the 3 parameters as mentioned below.

-Dgemfire.SystemFailure.chronic_memory_threshold=0 -- zero bytes

With this parameter set to 0, the cache subsystem doesn’t attempt to react to any memory deficit under any circumstances. This setting alleviates the problem reported above. But, it also disables the cache subsystem’s ability to react to actual severe memory deficits. The likelihood of an actual system OOM is low. The impact of such a situation extends well beyond the cache subsystem into other LiveCycle and customer applications. So the recover activities of this one subsystem are unlikely to be critical to the overall state of system health.  Adobe feels that it is reasonable to disable it.

-Dgemfire.SystemFailure.MEMORY_POLL_INTERVAL=360 -- six minutes or 1/10th hour
-Dgemfire.SystemFailure.MEMORY_MAX_WAIT=7200 -- two hours

With the memory deficit and associated potential recovery activities disabled, it is not necessary or useful to set these timer values to monitor memory aggressively. Instead, set the MEMORY_MAX_WAIT to a time period approximating the longest likely duration between garbage collections on a lightly loaded system — about two hours. The poll interval is set to ten times per hour.  The basic polling logic cannot be entirely disabled, but these long poll intervals minimize the already light impact of the thread activity.

Additional information

 LiveCycle Cache Subsystem Configurable JVM Properties

For reference, the controlling parameters for the LiveCycle cache subsystem mentio1ned above are defined as follows:

gemfire.SystemFailure.chronic_memory_threshold = <bytes>

default=1048576 bytes

This setting is the minimum amount of memory that the cache subsystem tolerates before attempting recover activities.  The initial recovery activity is to attempt a System.gc().  If the apparent memory deficit persists for the MEMORY_MAX_WAIT, the cache subsystem enters a system failure mode.  If this value is 0 bytes, the cache subsystem never senses a memory deficit or attempts to take corrective action.

gemfire.SystemFailure.MEMORY_POLL_INTERVAL = <seconds>

default is 1 sec

This setting is the interval, in seconds, that the cache subsystem thread awakens and assesses system free memory.

gemfire.SystemFailure.MEMORY_MAX_WAIT = <seconds>

default is 15 sec

This setting is the maximum amount of time, in seconds, that the cache subsystem thread tolerates seeing free memory stay below the minimum threshold. After exceeding the specified time interval it declares a system failure.

reference: (181544183/2607146)

VN:F [1.9.22_1171]
Was this helpful? Please rate the content.
Rating: 7.0/10 (2 votes cast)