G1GC tuning

What is G1GC ?

its a generational collector — just like other collectors — divides the heap space to two generations namely young and old gen

G1 implements 2 GC algorithms

  1. young generation GC
  2. old generation GC
  • G1 full GCs are single-threaded (in Java 8) and very slow, so should be avoided. Use -XX:+PrintAdaptiveSizePolicy to know the reason for the full GC.
  • Avoid the “to-space exhausted” situation — your heap needs to be larger.
  • Avoid humongous allocations — -XX:+PrintAdaptiveSizePolicy tells you why and -XX:G1HeapRegionSize lets you size the regions larger.
  • G1 is intended to be tuned with just two parameters — -Xmx for the maximum heap size and -XX:MaxGCPauseMillis for a target max pause time (default is 250ms)

G1 divides the heap in to 2048 (this is also configurable -XX:G1HeapRegionSize flag) using regions and tag them as Eden survivor or old unlike other GC s (refer below image) . G1 collector scan those regions that contain the most garbage objects first,giving it its name (Garbage first). This approach reduced the chance of the heap being depleted before background threads have finished scanning for unused objects,

in which case the collector will have to stop the application which will result in a STW (stop the world ) collection.

The G1 also has another advantage that is that it compacts the heap on-the-go, something the CMS collector only does during full STW collections. (compaction happens during the collection itself without doing it in a separate cycle ). G1 has introduced other advantages (optimizations ) as well ,such as string deduplication and getting rid of permGen.

For large heaps G1 is better because of the way it can divide work between different threads and heap regions

Traditional GC layout

GCG1 memory allocation layout

Advantages of G1GC

  • default GC from java 9 onwards
  • ability to compact free memory space without lengthy pause times
  • works well with large heap sizes
  • works well with vertical scaling
  • many tuning opportunities (unlike other GC s)

Tools Used

Jprofiler was used for profiling the JVM

GC Viewer was used to analyse the GC logs, this gave a graphical view of GC activities and heap usage, further provided stats about throughput, promotion rate, GC count and related times.

Available Flags in G1GC

The G1 GC is an adaptive garbage collector with defaults that enable it to work efficiently without modification. Here is a list of important options and their default values. This list applies to the latest Java HotSpot VM, build 24. You can adapt and tune the G1 GC to your application performance needs by entering the following options with changed settings on the JVM command line.

  • -XX:G1HeapRegionSize=n
  • Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB. The goal is to have around 2048 regions based on the minimum Java heap size.
  • -XX:MaxGCPauseMillis=200
  • Sets a target value for desired maximum pause time. The default value is 200 milliseconds. The specified value does not adapt to your heap size.
  • -XX:G1NewSizePercent=5
  • Sets the percentage of the heap to use as the minimum for the young generation size. The default value is 5 percent of your Java heap. This is an experimental flag. This setting replaces the -XX:DefaultMinNewGenPercent setting. This setting is not available in Java HotSpot VM, build 23.
  • -XX:G1MaxNewSizePercent=60
  • Sets the percentage of the heap size to use as the maximum for young generation size. The default value is 60 percent of your Java heap. This is an experimental flag. This setting replaces the -XX:DefaultMaxNewGenPercent setting. This setting is not available in Java HotSpot VM, build 23.
  • -XX:ParallelGCThreads=n
  • Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.
  • If there are more than eight logical processors, sets the value of n to approximately 5/8 of the logical processors. This works in most cases except for larger SPARC systems where the value of n can be approximately 5/16 of the logical processors.
  • -XX:ConcGCThreads=n
  • Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).
  • -XX:InitiatingHeapOccupancyPercent=45
  • Sets the Java heap occupancy threshold that triggers a marking cycle. The default occupancy is 45 percent of the entire Java heap.
  • -XX:G1MixedGCLiveThresholdPercent=65
  • Sets the occupancy threshold for an old region to be included in a mixed garbage collection cycle. The default occupancy is 65 percent. This is an experimental flag. This setting replaces the -XX:G1OldCSetRegionLiveThresholdPercent setting. This setting is not available in Java HotSpot VM, build 23.
  • -XX:G1HeapWastePercent=10
  • Sets the percentage of heap that you are willing to waste. The Java HotSpot VM does not initiate the mixed garbage collection cycle when the reclaimable percentage is less than the heap waste percentage. The default is 10 percent. This setting is not available in Java HotSpot VM, build 23.
  • -XX:G1MixedGCCountTarget=8
  • Sets the target number of mixed garbage collections after a marking cycle to collect old regions with at most G1MixedGCLIveThresholdPercent live data. The default is 8 mixed garbage collections. The goal for mixed collections is to be within this target number. This setting is not available in Java HotSpot VM, build 23.
  • -XX:G1OldCSetRegionThresholdPercent=10
  • Sets an upper limit on the number of old regions to be collected during a mixed garbage collection cycle. The default is 10 percent of the Java heap. This setting is not available in Java HotSpot VM, build 23.
  • -XX:G1ReservePercent=10
  • Sets the percentage of reserve memory to keep free so as to reduce the risk of to-space overflows. The default is 10 percent. When you increase or decrease the percentage, make sure to adjust the total Java heap by the same amount. This setting is not available in Java HotSpot VM, build 23.

G1GC parameters to tune and their effects on latency

g1gc requires lot of tuning since it s behavior changes according to your application. we tried g1gc with default values and with overridden flags against default parallel GC, eventhough GCG1 displayed limited amount of full GC s compared to parallel GC ,full GC s took a long time ,thereby causing average and max response times to increase . Each application is unique, you may need to tune G1 GC in an iterative process.

below mentioned things can be done to avoid full GCs.

Make old Gen area large

This is the simplest way to avoid full GC in G1. If the size of old area is increased, we may not so many full GC. But there is a trade-off. In that case, the young area size is decreased. So we may have more minor garbage collection which does small stop-the-world. On the contrary, if we decrease old area, it can cause many time full GC while achieving shorter time garbage collection. The size of young area and old area can be changed with -XX:NewRatio or -XX:G1NewSizePercent.

XX:G1NewSizePercent=5 (default)

Sets the percentage of the heap to use as the minimum for the young generation size. The default value is 5 percent of your Java heap. This is an experimental flag.

-XX:G1MaxNewSizePercent=60 (default)

we can adjust these two values to ensure out of the heap allocated young area takes a smaller portion and old gen takes a bigger portion . since full gc s happen when old gen also completely filled its very unlikely full gc occurs since old gen is bigger now.

Increase the background thread

If you have sufficient CPU cores, increasing the number of background thread can improve the performance.

You can change the number of background thread by setting -XX:ParallelGCThreads.(if you have 4 cores in your server or instance this value set to 4) The default value is same to the core of your machine.

You can reduce the time of garbage collection by increasing this setting. (parallel threads collect quicker ).

Run background threads frequently

Starting background thread as fast as possible can contribute for improving performance because background thread does not run frequently and cannot finish marking before reaching limit of old area, it can cause full GC.

We can let background thread starts time by setting -XX:InitiatingOccupancyPercent. This is the ratio of heap usage against total heap size.

Decreasing the value can be a pressure to start background thread fast. The default value is 45. But one thing to note is that if the value is too small, minor garbage collection run too frequently. It cost CPU cycle and can affects application performance itself. Please check CPU usage of your application.

  • Percentage of the (entire) heap occupancy to start a concurrent GC cycle. It is used by GCs that trigger a concurrent GC cycle based on the occupancy of the entire heap, not just one of the generations.
  • above option can be used to change the marking threshold
  • If threshold is exceeded, a concurrent marking will be initiated next.
  • The higher the threshold is, the less concurrent marking cycles will be, which also means the less mixed GC evacuation will be.

so in our case it is better to set this value low .

low initializing occupancy percent →marking happens sooner → more concurrent marking cycles → less likely reaching the limit of old area since →thereby prevents possible full GC s

Let GC process more data

Once a concurrent cycle finishes, the next concurrent cycle will not start until marked regions in old area is empty.

So increasing the data which is processed by one garbage collection cycle of old are can contribute starting marking phase fast.

There are two settings we can set publicly. -XX:G1MixedGCCountTarget and -XX:MaxGCPauseMillis. Full garbage collection in G1 GC is called mixed GC because it does minor GC on young area and full GC in old area at the same time.

And it runs some times until almost all marked regions are free. -XX:G1MixedGCCountTarget specifies the maximum number that mixed GC can try to make marked regions free. So decreasing this value can reduce the time of total time of mixed GC in one cycle.

-XX:MaxGCPauseMillis is the maximum time span that mixed GC stops the world. Mixed GC tries to free marked region at most the count specified by -XX:G1MixedGCCountTarget. If it does not reach the maximum time specified by -XX:MaxGCPauseMillis, the mixed GC thread tries to free more memory and it can reduce the total time of full GC cycle.

namedefault value-XX:G1MixedGCCountTarget 8-XX:MaxGCPauseMillis 200

Taming Mixed Garbage Collections

  • -XX:InitiatingHeapOccupancyPercent
    For changing the marking threshold. (discussed above )
  • -XX:G1MixedGCLiveThresholdPercent and -XX:G1HeapWastePercent
    When you want to change the mixed garbage collections decisions.
  • -XX:G1MixedGCCountTarget and -XX:G1OldCSetRegionThresholdPercent
    When you want to adjust the CSet for old regions.

-XX:G1MixedGCLiveThresholdPercent (default 65% )

  • This option can be used to change the threshold which determines whether a region should be added to the CSet (Collection set — a set of regions that should be collected in the next cycle) or not.
  • Only regions whose live data percentage are less than the threshold will be added to the CSet.
  • The higher the threshold (default: 65) is, the more likely a region will be added to the CSet, which also means more mixed GC evacuation and longer evacuation time will happen.
  • old regions with most garbge is chose first
  • -XX:G1HeapWastePercent
  • Amount of reclaimable space, expressed as a percentage of the heap size that G1 will stop doing mixed GC’s. If the amount of space that can be reclaimed from old generation regions compared to the total heap is less than this, G1 will stop mixed GC’s.
  • Current default is 10%. A lower value, say 5%, will potentially cause G1 to add more expensive region(s) to evacuate for space reclamation.
  • G1 will continue triggering mixed GC if the reclaimable is higher than the waste threshold. So, there will be more mixed GC and maybe more expensive if you set this waste threshold lower.
  • if there is 8% grabage and 92% live objects in a particular old region , copying those 92% of these objects is gonna be costly . in this case 8% garbage is not cared and that region is not paid attention since we can only have a very little gain from copying it. we will reclaim few kB s but the operation is not worth it.

Recent performance runs indicate that G1HeapWastePercent often benefit from a lower value and that G1MixedGCLiveThresholdPercent benefit from a higher value.

Terminology

S0C — Current survivor space 0 capacity (KB).

S1C — Current survivor space 1 capacity (KB).

S0U — Survivor space 0 utilization (KB).

S1U — Survivor space 1 utilization (KB).

EC — Current eden space capacity (KB).

EU — Eden space utilization (KB).

OC — Current old space capacity (KB).

OU — Old space utilization (KB).

PC — Current permanent space capacity (KB).

PU — Permanent space utilization (KB).

YGC — Number of young generation GC Events.

YGCT — Young generation garbage collection time.

FGC — Number of full GC events.

FGCT — Full garbage collection time.

GCT — Total garbage collection time.

Test results and deductions

these results are obtained after running the same load test for a period of 1 hour changing the parameters of the G1GC garbage collector .

parameters / flags addedyoungyoungyoungold

#of young GC events

time taken for YGC events

# of full GC events

time taken for full GC eventsfull time for GC (YGCT + FGCT )time per cycle of each young gc (ms)time per each full GC cycle (sec)
S0S1EOMCCSYGCYGCTFGCFGCTGCTYGCT/YGCFGCT/FGCTest 4 → G1GC with 8gb and 8gb010071.3164.992.3687.3570086.224312.62898.852123.17714294.209333333Test 2 → G1GC010055.6164.3191.7286.651238121.023928.987150.01973.220777778Test 1 → Parallel GC79.92078.9358.8192.5688.051719102.9583136.209139.167591.168032258Test 3 → g1gc with 4gb 4gb and max pause millis 150010039.3179.592.987.621525139.421134.931174.35
3.4g1gc with heap occupany 30%010089.5340.8192.1986.41418440044
0Test 5 → g1gc with 9GB 9GB with occupancy 25%0100695692875527128.62380.57
4.3Test 6 → g1gc 10GB 10GB reserve 15%01009038

61087.93829.88897
4.9Test 7→ XX:G1MaxNewSizePercent=35%010017.71379491965121314135
4.6Test 8→ -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:G1NewSizePercent=20010068349288694710071
0Test 9 → G1GC tuned with t3.2xlarge 0 100 68.06 34.43 92 88 694 71 0 0 71.2
0Test 10 → GCG1 tuned with c5.2xlarge for 24 hrs 0 100 83 34 93 89 20449 842 0 0 842.14

Test 1

parallel GC (default values )

we noticed 31 full GC s occurring in an hour period each full GC approximately took (1.1 sec) (FGCT / FGC )

too many full GC s taking place in such a short period is undesirable.

Test 2

G1GC without any flags (with no tuning)

this reduced total full GC s to 9 but time taken per full GC is higher

still full GC s exist , objective is minimize or get rid of full GC s.

Test 3

G1GC with xms = 4GB xmx =4GB and max pause time millis =150 ms

this increased the number of full GC s to 11 and time taken per full gc is also higher than 1st scenario, took about 3 sec per each full gc

Test 4

G1GC with xmx = 8GB and xms = 8GB

this one performed really well reducing full GC amount to 3

but once again each full GC took about 4 seconds

therefore not a significant improvement in response times were observed

Test 5

g1gc with 9GB and 9GB and occupancy = 25

-XX:InitiatingHeapOccupancyPercent=25

heap increased in order to avoid full gc s and occupancy was reduced to 25% meaning marking starts when 25% of the entire heap fills, thus marking will finish sooner and collection will happen before a full gc takes place

previous occasions we believed heap got filled before marking finishes and caused full GC s

this reduced the full GC s to 2

Test 6

we wanted to avoid the 2 full GC s occurring now too

heap further increased to 10 , and increased reserve from 10% to 15% , this reserves 15 % of the total heap , in case old gen is filled and it is no longer approving promotions from young gen , these promotions will go to the reserve thereby preventing possible full GC.

-XX:G1ReservePercent=15

Test 7

max new size percent was reduced to 35% from 65% , earlier young gen is allocated 65% of total heap which makes old gen 35% of total heap, this makes old gen more likely to run out of space since it is small. we made young gen 35% thus making old gen 65% making old gen bigger and very unlikely to overflow.

-XX:G1MaxNewSizePercent=35

but this caused 3 full GC s

Test 8

added 3 more flags

-XX:+ParallelRefProcEnabled (makes the reference processing procedure parallel thus makes it faster — this is recommended if suing g1gc )

-XX:+UseStringDeduplication ( recommended for g1gc since this avoids similar strings being duplicated thus saves space )

-XX:G1NewSizePercent=20 (increases minimum size of young heap from 5% to 20% , this reduces the promotion rate from young gen to old slower )

this made full GC to zero.

after multiple tests we arrived at above conclusion

above graph shows the max response times recorded at each instance (above results are taken after 1 hour load test under similar situations )

as you can observe parallel GC demonstrates max response times touching up to 15000 ms and GCG1 without tuning has shown worse results since it frequently touches max response times around 10000ms.

but GCG1 with tuning displayed slightly better results compared to our base (parallel GC -topmost image ) by demonstrating low spikes and a very low max response time during the latter half of the test.

Test 9

G1 works well with vertical scaling , therefore we thought of vertically scaling the instances and moving in to bigger ones and observe G1 s behavior, this provided the advantage of having more cores thus enabling us to use more parallel gc cleaning threads.

this vertical scaling improved the max and avg response times to a significant amount.

we used a bigger instance. t3 2x large (0.3328 per hour) , this is twice the cost as the previous instance t3 x large .(0.1664 per hour )

but t3 2x large auto scaled up to 3 instances with in 1 hour for the above specified load.

-XX:ParallelGCThreads=8 -XX:ConcGCThreads=2

above flags were used since now the new instance has more cores and more memory, it allows us to use more cleaning threads.

Recommendation is to if the available number of cores is 8 or less use parallel threads that is equivalent to no of cores available , so in this case since 8 cores were available 8 threads were used.

Recommendation for concurrent GC threads is to have 1/4th of number parallel threads used , so 1/4 * 8 = 2 . 2 concGC threads were used.

then we again conducted 3 more load tests simulating the production load to verify

as you can see in the above image parallel GC in 8 t3.xlarge instances showed an avg response time ranging around 50ms while GCG1 with 8

parallel GC in 8 t3.xlarge instancesvarying around 50 ms1–4 secGCG1 tuned in 8 instances of t3.xlargevarying around 50ms1–4 sec with minor exceptionsGCG1 tuned in 4 instances of t3.2xlargevarying around 40ms1–2 sec minor exceptions

so GCG1 tuned with vertical scaling (with bigger instance ) affected both max and avg response times positively .but there was no cost benefit as such since 4 t3.2xlarge instances cost the same as 8 t3.xlarge instances.

t3 vs c5

as you can see web service with c5 2x large instances displayed significant improvement when comes to max response times over t3 2xlarge instances. with c5 instances max response time was well below 0.5 sec for most of the time

above was a 2 hour load test , several tests were conducted and average was taken in order to reduce human errors and further verify the accuracy of the stats.

then a 24 hour load test was conducted for both occasions to see whether there is a possibility of a full GC occurrence that harshly affects avg and max response times.

24 hr load test on C5.2Xlarge

c5 was chosen since it offers 8 cores while keeping 16GB memory (C5 2x large is bit more expensive than t3.2xlarge 0.34$ vs 0.33$ and double the cost of current instances we use t3.lxarge) ,having more cores gives us the opportunity to introduce more GC cleaning threads

we have basically avoided full GC s by distributing the possible occurrences of having a full GC across multiple young GC s and mixed GC s . so instead of undergoing a full GC , mixed GC s and young GC s will happen freqently causing more cpu cycles

in this case a compute intensive instance such as C5 is better

so these 2 reasons justify the selection of C5 instances

below are the results of the 24 hour load test that we ran with 3 c5 instances

these stats were obtained by analysing GC logs via GC viewer , no full GC s were experienced for 24 hours

GC logs here → https://sysco-my.sharepoint.com/:f:/p/sthe3935/EkKtcASNzodOoQlOaKcR5DEBgx-kfD-4m2STDIjypRLjag?e=D48mgd

full GC activities / full gc time0 / 0young GC activities20449young GC time842.141 sectotal GC time842.141

below is the behavior of avg and max response times under the load test within the first 5 hours

next 5 hours

so as you can see GCG1 tuned with 3 instances of c5.2xlarge performs very well when comes to max and avg response times

avg response time is below 50 ms for the most part and max response time was below 4 sec.

no autoscaling happened from rds side or ec2 side

( scale up policy — simple scaling -CPU Utilization > 50 for 3 periods of 60sec ). so each instance CPU utilization stays around 40.

RDS also

approximately 25% CPU utilization in each RDS.

scaling policy scale up if cpu utilization > 50% and connection count > 8000

connection count stays around 6000 most cases.

Summary of Changes Added

introducing garbage first garbage collector (G1GC) changes
following flags were added

-XX:+UseG1GC disabled enableduse G1GC collector instead of parallel — have to explicitly specify the collector we want-XX:+ParallelRefProcEnabled disabled enabledmake the reference processing action parallel — this makes marking faster-XX:+UseStringDeduplication disabledenabledavoid writing the same string to memory over and over again to optimize memory-XX:+AlwaysPreTouch disabled enabledthis flag tells the OS to map the memory pages to the java process at the time of process initialization versus at incrementally at runtime
-XX:InitiatingHeapOccupancyPercent45%25%start marking when heap is 25% filled ,so marking starts soon thereby cleaning starts soon causing occurrence of full GC unlikely
-XX:G1ReservePercent=1510%15%reserve this amount of heap for object promotion if old gen fills — so in our case 15% of 10GB reserved to promote objects in case old gen is completely filled and no longer can accept any objects.-XX:ParallelGCThreads=8number of available cores8number of STW worker cleaning threads-XX:ConcGCThreads=21/4th of number of available cores2Number of parallel marking threads
-XX:+UnlockExperimentalVMOptions — below 2 flags are experimental flags they need to unlocked before use-XX:G1NewSizePercent=205%20%increasing the min size of young gen to decrease the object promotion rate to old gen -if young gen is small it will fill fast and object will get promoted to old gen faster filling old gen.-XX:G1MaxNewSizePercent=3565%35%decreasing the max size of young gen so old gen gets more of allocated heap and it is unlikely old gen being filled since it is large-Xmx2GB for t3 / 4 GB for c510GBmax heap size — GCG1 works better with high heap sizes / copying objects is easier (explanation refer FAQ)-Xms250MB / 500MB10GBmin heap size — default value is too small so has to be explicitly set , xms set equal to xmx .

Impact from above GC implementation

Response Times

got better 80ms →40 ms/50ms

Cost Benefit

c5 is better since it keeps 16GB memory while providing 8 cores additionally we get better performance since we don’t have to have cpu credits to burst the cpu.

t3.xlarge8$0.1664 per Hour* 8 = 1.3312 per hour 958.464$ per monthc5.2xlarge3$0.34 per hour * 3 = 1.02 per hour 734.40$ per month

Saving = 224.064 $ per month

Logging in G1GC

G1 has a very detailed logging. it incurs only very little overhead. so keeping them enabled is recommended.

-XX:+PrintGCDateStamps- Prints date and uptime

-XX:+PrintGCDetails — Prints G1 Phases

-XX:+PrintAdaptiveSizePolicy -Prints ergonomic decisions

-XX:+PrintTenuringDistribution- Print aging information of survivor regions

refer this article to get a better understanding about log format → https://www.redhat.com/en/blog/collecting-and-reading-g1-garbage-collector-logs-part-2

FAQ

Why does G1GC work better with a larger heap?

at the beginning JVM starts→heap allocated → heap is tagged as survivor, eden or old regions→ application starts →objects are created →eden space fills up →when all the eden spaces are fully filled young GC will take place → young GC will collect all the live objects in eden regions and will copy them to survivor regions and old gen according to their age and eligibility → for this copying to happen there should be enough free regions(old and survivor ) to accommodate aged objects coming from eden regions during young GC → if there is less space whatever is coming from eden space wont have space to stay and compact. because of this reason G1GC with high heap sizes are preferred.

How does young GC work?

G1 stops the world

G1 builds the collection set (eden and survivor ) — collection set (Cset ) is the regions that G1 wants to look at during particular collection — since its YGC only looks at eden and survivor

First phase: “Root Scanning” — Static and local objects are scanned

Second phase: “Update RS” (remember set — data structure that remembers Eden region’s pointers to outside- someone from outside points to me from old area — they are remembered in the remember set)
Drains the dirty card queue to update the RS

Third phase: “Process RS”
Detect the Eden objects pointed by Old objects.

Fourth phase: “Object Copy”
The object graph is traversed
Live objects copied to Survivor/Old regions

Fifth phase: “Reference Processing”
Soft, Weak, Phantom, Final, JNI Weak references

Always enable -XX:+ParallelRefProcEnable

How does old GC work ?

G1 schedules an Old GC when the whole heap is -XX:InitiatingHeapOccupancyPercent full (default 45)

Step 1 — G1 stops the world shortly (STW)

Step 2 — G1 Performs a young GC while doing so mark the roots (initial mark — identifying what objects in old area has pointers from objects in young area- these old area objects are live and necessary and will be used as roots to find the other live objects in old region in step 4 )

Step 3 — G1 resumes app threads

Step 4 — Concurrent marking starts (keep track of references and per region liveness calculated )

Step 5 -Stops the world again

Step 6 -Remark phase (SATB and Reference processing )

Step 7 — Clean up phase — empty old regions are immediately recycled (if we do not navigate to a particular old region that means it is full of garbage no reference to it — they are immediately recycled ) ( what happens to non empty old regions here ? how fragmentation resolved ? — refer mixed GC)

Step 8 — Application threads are resume

Missing : reclaiming regions that are partially full of live objects and partially garbage filled.(this is handled by mixed GC , refer below )

What is mixed GC and how does it work?

cleanup phase (step 7 ) in old GC recycles empty old regions . but what about non empty old regions?. mixed GC takes care of this.

After an old gen GC is complete, G1 sets a mixed flag, so that after the next young GC, the -XX:G1MixedGCCountTraget fraction (default 8 meaning 1/8th) of old regions which are largely empty are copied to another old region to free up those regions.

This is the compacting phase. The regions targeted are those that are more than -XX:G1MixedGCLiveThresholdPercent full of garbage (default 85), and ignores regions with -XX:G1HeapWastePercent garbage (default 5)( ignore regions that are 95% filled with live objects ). The mixed flag is turned off when there is enough space reclaimed

happens during next young GC cycle.

Choose the old regions that are partially filled

Divide it by 8 and chose that many regions

Put them to collection set While doing the next young GC

Now collection set includes -> Eden regions, survivor regions , ⅛ of remaining old regions to collect

while doing old GC → we counted how many live objects per each region in old area

First copies regions that are mostly empty to other old areas → Reclaim lot of space → Less expensive (eg:- More than 85% full of garbage 15% live (if I move 2 objects and if that regions is done that’s very cheap ))

( If 92% full of live objects and 8% garbage copying these live objects to other old areas is expensive and has a very little gain )

In this ⅛ mostly targeted areas that are empty

Marking is finished →app runs → allocates more → G1 starts a YGC → here mixed flag is turned on →mixed GC performed

( G1 compacts using copying to other regions and by this avoids memory fragmentation )

Stop the world ->

Old region -> collect all the live objects → leave garbage behind -> that particular old region recycled -> 5, 6 old regions copy their live objects and gets recyled we can reclaim that space

Next young GC we will go to the next 1/8th of the old area remaining , as we go by in this cycle at one point G1 has reclaimed enough space, it might stop.not all mixed GC s will take place therefore , it might probably stop at 4th or 5th cycle.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store