JVM profiling

Server Statistics

The maximum amount of memory allocated for the JVM (Xmx value 8000m )
8GBThe minimum amount of memory allocated for the JVM (Xms value 8000m)
8GBDefault max memory allocated for the JVM (Xmx value 4000m — (1/4)th of the total ) unless overridden
4GBDefault min memory allocated for the JVM (Xmx value 4000m / 1/6th of the total ) unless overridden 2GB

Tools used

Jprofiler was used for profiling the JVM in the remote instance

Jmeter was used to send a load to the ELB to which the monitored instance was attached.

Prerequisites for the test ( Jmeter settings )

we wanted to send the same load to the CP nextgen webservice 4.1 version and observe its behavior under different garbage collectors. below settings were used to generate the load and feed it to the ELB to which the monitored instance was attached.

following is a screenshot from Jmeter console.

100 threads (target concurrency )

1000 seconds ramp up time

10 ramp up steps

500 seconds holding the target rate

this means

every 100 seconds , 10 users will be added until we reach 100 users (1000 seconds divided by 10 steps equals 100 seconds per step. 100 users divided by 10 steps equals 10 users per step. Totaling 10 users every 100 seconds).

After reaching 100 threads all of them will continue running and hitting the server together for 500 seconds.

so all together this load took 1000 sec ramp up time + 500 hold up = 1500 seconds ( 25 mins)

above parameters were used throughout the PoC to maintain a regular environment.

JVM behavior and the memory utilization of the web service under default JVM values (xms = 2GB , xmx = 4GB )

our app under a load of 20 threads

our application under a load of 30 threads

under 30 threads load JVM crashed since we were directing the load at a single instance and CPU utilization reached 100% . therefore it is clear that a single instance is unable to handle a load from more than 25 or so threads.

JVM behavior and the memory utilization of the web service under overridden JVM values (heap sizes max and min both set to 8GB)

web service at idle

JVM behavior and memory utilization of the web service under overridden JVM values and different garbage collectors with auto scaling enabled (with traffic sent to ELB instead of being sent directly to the instance )

Web service under a load of 100 threads and G1GC garbage collector

same image as above zoomed


there is roughly 12 sec gap between gc activities .better than most stats observed before.

GC graph below

our app with -XX:+UseParNewGC (CMS garbage collector)

zoomed image

cms gc activity

our app with Serial Garbage Collector

our app with Parallel Garbage Collector


above is the summary of this PoC

behavior of the cloud pricing nextgen web service 4.1 version with the 4 available garbage collectors for java 8 and below.

these images reflect minor GC activities and collections only .

hence we can summarize

CMS garbage collector has the most frequent GC activities and there is barely and gap between activities , serial GC performs little better in this aspect and there is a 5–6 second interval between each minor garbage collection activity , parallel garbage collector

on the other hand has a 7–8 second interval between its GC activities, Garbage first garbage collector (GCG1) has a 12–13 second interval between its GC activities which makes it the preferable GC to use with cp nextgen ws 4.1 version

so in summary minor GC collection acitivity interval

GCG1 > Parallel GC > Serial GC > CMS GC

when we take memory utilization also in to the consideration CMS GC has a very low used memory due to garbage getting collected at a high frequency thus making available space bigger. serial GC and parallel GC demonstrated almost identical pattern when comes to memory utilization since their garbage collection frequencies are very close to each other. but GCG1 shows fairly high memory utilization at a given time compared to other instances since its garbage collection activities are not as frequent as other occasions thus making used memory percentage spike at a given time.

our app performance with G1GC and heap sizes used for nextgen 4.0 (xmx = 8GB xms =4GB )

here cp nextgen 4.1 was tested with heap sizes used for the 4.0 version. if you refer the profiling documentation for 4.0 version max heap size is set to 8GB while min heap size set to 4GB.

so 4.1 ‘s behavior was was investigated under 2 different xms values 4GB and 8GB , above is the result comparison.

if you look at the above results when higher xms value is used garbage collection invocations are less frequent.

effects of string deduplication when added (TBA)

string deduplication can be enabled by adding below flags to your java -jar command

-XX:+UseG1GC -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics

What is string deduplication ?

Imagine you have a phone book, which contains people, which have a String firstName and a String lastName. And it happens that in your phone book, 100,000 people have the same firstName = "Jon".

Because you get the data from a database or a file those strings are not interned so your JVM memory contains the char array {'J', 'o', 'n'} 100 thousand times, one per Jon string. Each of these arrays takes, say, 20 bytes of memory so those 100k Jon's take up 2 MB of memory.

With deduplication, the JVM will realise that “Jon” is duplicated many times and make all those Jon strings point to the same underlying char array, decreasing the memory usage from 2MB to 20 bytes. (this is yet to be explored)

string deduplication results when run locally

this is the application when run locally during initialization for about 10 mins. As you can see String Deduplication was executed 54 (highlighted) times and “deduplicated” 38.1MB Strings. String Deduplication inspected 1941986 Strings. When run in production environment under heavy load i believe these figures will get larger and string deduplication will execute more times causing heap drop amount to be significant.

Commands and flags used

  • XX:+UseParallelGC — Parallel Garbage Collector
  • -XX:+UseConcMarkSweepGC — CMS Garbage Collector
  • -XX:+UseG1GC — G1 Gargbage Collector
  • -XX:ParallelCMSThreads= (TBT)CMS Collector — number of threads to use

example usage

java -XX:+UseParallelGC -Denv=$SERVER_ENVIRONMENT_VARIABLE -Dlogback.configurationFile=./logback.xml -jar -Xmx8000m -Xms8000m  cpwssm04.jar 8080 >/dev/null 2>&1 &

Other necessary commands

you can use ‘jps’ command to find the process id of your jar if it is running

once PID is known you can used jcmd 10980 VM.flags ( jcmd {{pid here}} VM.flags ) which will return the below response. you can check whether the values you set has been accepted by the JVM.

Response :

-XX:CICompilerCount=4 -XX:InitialHeapSize=6442450944 -XX:MaxHeapSize=7340032000 -XX:MaxNewSize=2446327808 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=2147483648 -XX:OldSize=4294967296 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseParallelGC

Possible Future work

If we use parallel GC or G1GC, we can specify maximum garbage collection threads and pause time, throughput and footprint (heap size).

  • The numbers of garbage collector threads can be controlled with the command-line option -XX:ParallelGCThreads=<N>.
  • The maximum pause time goal (gap [in milliseconds] between two GC)is specified with the command-line option -XX:MaxGCPauseMillis=<N>.
  • The maximum throughput target (measured regarding the time spent doing garbage collection versus the time spent outside of garbage collection) is specified by the command-line option -XX:GCTimeRatio=<N>.
  • Maximum heap footprint (the amount of heap memory that a program requires while running) is specified using the option -Xmx<N>.

we can tweak these values further to obtain desired performance and behavior optimizations if required.

Java 8u20 has introduced one more JVM parameter for reducing the unnecessary use of memory by creating too many instances of same String. This optimizes the heap memory by removing duplicate String values to a global single char[] array.

This parameter can be enabled by adding -XX:+UseStringDeduplication as JVM parameter. (This can directly be used in our implementation)


Why min max heap sizes set to the process instead to the JVM itself ?

if it is set to the JVM itself all the other java programs running inside the instance will also receive those values in the JVM which is undesirable, (since different programs have different memory requirements we might end up under utilizing or over allocating memory), but in our case since we are running only one java program in a single instance setting xmx and xms (heap min max values) to the JVM itself wont do any harm , but setting it to the process we have more control and we can change it whenever necessary from code deploy scripts without hassle.

Why Xmx value and Xms value is the same (min and max heap size set to the same value) ?

If the xms (min heap size ) set to a lower value Application will suffer frequent GC. Every time asking for more memory from OS with consume time and it is an unnecessary overhead.

Above all, if your application is performance critical then you would certainly want to avoid memory pages swapping out to/from disk as this will cause GC consuming more time. To avoid this, memory can be locked. But if Xms and Xmx are not same then memory allocated after initial allocation will not be locked.

On a production machine, setting -Xms to the same as -Xmx is ideal . Why? Because a production machine/ instance is typically a one-use machine (that is, other than the OS, only the app server will be running — in our case only web service is running). And after running a while the instance will have grabbed all of the heap memory it needs, and won’t let it go. So you might as well give it all the heap memory to start with.

(There is also a small performance penalty for making the JVM continually ask the OS for more heap space, and also for causing the JVM to determine whether it needs nor space or can give space away. Setting -Xms and -Xmx to the same value avoids performing those calculations. But the performance difference on modern processors and RAM configurations is negligible.)only downside of setting Xmx ( min heap size ) to such a high value is , Larger it is app startup time will increase. but in our case we can tolerate a little extra startup time.

setting xmx and xms as same value is not desirable in a development machine since we run other java programs too.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sithija Thewahettige

Sithija Thewahettige

Software Engineering Intern @ mubasher technologies