Have you received an alert like this?

EM Event: Warning:Memory Utilization is 85.004%, crossed warning (85) or critical (90) threshold.

[ALERT] priority of [Medium] for alert [Memory Utilization above 80%] on resource

You’ve probably decided to set up Oracle Enterprise Manager (OEM) or JBoss Operations Network (JON) monitoring for the Linux host machine running your Oracle Middleware/Database software or JBoss middleware.

After some time running this alert about the host’s Memory Utilization begins to occur. You check OEM or JON consoles and even more disturbingly this is an upward trend that has been happening since the software started running.

No problems are occurring within the applications or database itself, so what should be done?

TL;DR

Linux’s memory management by design will always try to use free RAM for caching. If memory requirements increase these buffers will be released immediately. No action should be taken and the alert should be tuned for either a much higher threshold (98%) or disabled. See the final section of this post for what metrics you should monitor to manage the health of your platform’s memory.

The Long Explanation

If you still need reassurance that the alert is nothing to worry about we can go through the process of checking ourselves to make sure everything is healthy.

How does Oracle EM Agent determine Memory Utilization on a Linux Host?

The answer to this question is found in the following Oracle Support doc:

https://support.oracle.com/rs?type=doc&id=730104.1

To summarise, OEM will calculate the memory capacity by comparing MemTotal and MemFree in /proc/meminfo

[oracle@prd-soa1 ~]$ grep Mem /proc/meminfo

MemTotal: 68042432 kB

MemFree: 4403660 kB

(4403660 / 68042432) * 100 = Free Memory Percentage. (Currently about 6%)

JBoss ON probably uses a similar method for calculating the memory utilization.

How does Linux Manage Memory Utilization?

Using the free command we can see the Linux memory state (pass the -m flag for megabytes, default is kilobytes).

[oracle@prd-soa1 prd-ofm-domain]$ free -m

total used free shared buffers cached

Mem: 66447 62159 4288 0 572 25056

-/+ buffers/cache: 36530 29917

Swap: 1023 0 1023

The second row of output, preceded by “-/+ buffers/cache” represents the true state of the Linux memory when the RAM level caching isn’t counted. In this example we can see that the actual available memory is 30GB, despite the first row showing that only 4GB of the 64GB is free.

We can also see our swap used is 0. This is a good sign of a healthy system. Linux will utilise the swap space when it cannot fit objects into the RAM.

Check Memory Health Using vmstat

The inbuilt vmstat command can provide us with some more helpful diagnostics of the state of the Linux memory system. Here I pass two arguments, the first is the wait between checks and the second is the number of checks. The vmstat process will check ten times over ten seconds. You can adjust these as needed. Finally the output is piped to column to help with formatting and readability.

[oracle@prd-soa prd-ofm-domain]$ vmstat 1 10 | column -t

prcs		-----------memory----------				-swap-		--io--		--system--		-----cpu-----
r	b	swpd	free	buff	cache	si	so	bi	bo	in	cs	us	sy	id	wa	st
5	0	0	4403052	585904	25646240	0	0	1	11	2	0	6	1	93	0	0
3	0	0	4403160	585904	25646240	0	0	0	32	7441	4705	44	3	53	0	0
2	0	0	4403032	585904	25646280	0	0	0	448	7226	4555	41	2	57	0	0
1	0	0	4403032	585904	25646352	0	0	0	144	1993	644	23	0	77	0	0
2	0	0	4403024	585904	25646352	0	0	0	12	8188	4682	47	4	50	0	0
2	0	0	4402972	585908	25646408	0	0	0	792	7094	5075	37	2	61	0	0
3	0	0	4402540	585908	25646916	0	0	0	0	7954	4909	45	3	52	0	0
2	0	0	4402416	585908	25647008	0	0	0	128	6852	5058	40	2	58	0	0
2	0	0	4402208	585908	25647436	0	0	0	2	7934	5010	41	3	56	0	0
1	0	0	4402208	585908	25647564	0	0	0	0	1527	491	20	0	80	0	0

There are many excellent articles online about how to interpret this data. In our case we are mainly interested in checking for activity in the Swap In (si) and Swap Out (so) columns. Entries in the Swap Out column indicate that the operating system is swapping memory from RAM to disk when it doesn’t have the capacity to store everything in RAM.

In our example above we see no activity in the swap columns, so we can feel confident that memory utilisation is healthy.

Of course, running the vmstat logging over a longer period of time while consecutively loading your system will give the best indication of how swap is being used. However, why should we do that manually?

What Monitoring You Should Have For Linux Hosts Running Java Based Middleware

Because Oracle Weblogic and JBoss Fuse components all run in Java Virtual Machines we actually have total control of how much RAM our platform’s components will consume. It’s set by the Java arguments. So long as we do not set the combined total of our Xmx (Memory Max) arguments for all JVMs on the box (and allow some space for non-java processes to use RAM) we should never exceed the memory due to our JVMs.

Note: Of course it's possible to set Xms (Memory Min) to a value less than the Xmx, which would permit you to exceed the total memory of the system in your combined total of Xmx, and not realise the problem until each JVM tries to consume the Xmx value. It is best practice, however, to set the Xms and Xmx values to be the same. This improves performance and garbage collection and has the handy side-effect of preventing us from running too many JVMs.

By setting the Xmx value it means that if we do try start a Java process and the operating system memory (minus buffers) is not sufficient the JVM will fail to start.

Monitoring Correctly For Memory Usage

Use the following monitoring configurations to more accurately keep an eye on your hosts memory health:

Availability of JVM Processes - because if there’s not enough memory to start or run a process, the availability alert will let us know.

OS Swap Space Usage - because if the operating system is starting to consume swap space then we want to know, as that is a sign that there is no actual free RAM to use.

Middleware Process JVM Usage - because we can keep track of how high the JVM Heap usage is. If heap utilisation is consistently very low and we experience alerts for the other two alerts above it might be rational to decrease the size of the JVM to correct the problem.

Stuck in the Middleware

Thursday 4 June 2015

Understanding Host Memory Utilization Monitoring and Alerts for Java Based Middleware

Have you received an alert like this?

EM Event: Warning:Memory Utilization is 85.004%, crossed warning (85) or critical (90) threshold.