VM Large Pages, TPS and ASLR

In an earlier vCOps post I discussed how Large Pages effect consumed memory and how analyzing VM memory usage by Demand (or Active Memory), can give potentially misleading right-sizing recommendations.
In this post I will go into detail around the common subject of the effect of Large Pages on Transparent Page Sharing (TPS) and consumed memory, including the important topic of how Address Space Layout Randomisation (ASLR) effects host memory usage.

The Setup

vSphere 5.5 U1

Linux Server (SLES 11) 6GB RAM

Windows Server 1 (W2K8R2) 4GB RAM

Windows Server 2 (W2K8R2) 10GB RAM

The Question?

Why for most of our Virtual Machines does a VM’s Consumed Host Memory basically equal the amount of Configured Memory for the VM? When I check inside the guest OS there seems to be a large amount of free memory.

The Answer

The reason for this is essentially a combination of factors. However they include TPS not being used on Large Page VMs, the Guest OS and Address Space Layout Randomisation (ASLR).
First things first Large Pages and TPS.

Large Pages and TPS

Is is a pretty well known fact that since the introduction of Large Pages  with Intel EPT or AMD RVI that TPS is not used unless the host comes under memory contention. This is well explained in the Yellow Bricks Posts as well as KB 1021095. This is well known acceptable trade of performance over consolidation. But what effect does this actually have on Memory usage? TPS only save identical blocks right? It’s not like there would be gigabytes of these per ESXi host….

Well that’s where your wrong. One of the most common blocks that TPS can dedeplicate out is an empty 4KB memory page or unallocated memory page. This issue in conjunction with ASLR on certain guest OS’s (more on that later) can introduce a scenario where your guests consumed host memory usually matches your configured memory, weather the guest needs that amount of memory or not.

Before we get into controversial topics such as disabling Large Pages and ASLR lets look at some basic graphs.

The Testing Scenarios

Test 1 – Windows Server 1 4GB RAM with Large Pages Disabled

vCenter Stats - LP Disabled

From the graph above we can deduce a few key points:

  • Although 4GB of RAM is allocated only 3.2GB is being consumed on the host
  • The Active Memory is generally less than 1GB
  • The shared memory is equal to roughly the configured memory minus the consumed memory

What is also interesting about this VM is that the majority of the shared memory is made up of unused memory. But more on that in a sec. First lets look at the exact same VM after a vMotion to a host with Large Pages Enabled (the Default).

Test 2- Windows Server 1 – 4GB RAM with Large Pages Enabled

vCenter Stats - LP Enabled

What stands out straight away is that the VM is now consuming all 4GB on the ESX host. The shared memory which was around 1GB is now almost 0. This graph also shown the event of host memory contention. At 7:25 my ESXi host hits 94% memory usage and begins breaking large pages down into small pages, which then allows TPS to begin finding duplicate pages again.

Now just before I mentioned that the majority of the shared memory was unused memory. This is shown in the graph below.

Test 3- Windows Server 1 – 4GB RAM with Large Pages Disabled After vMotion

vCenter Stats - LP Disabled Single VM on Host

Now this is the same VM as the last test, and as you can see straight after the vMotion there is an amount of memory that is shared and the consumed memory is no longer at 4GB. This is an important fact and is well explained in KB 1021896. This is essentially because the guest has allocated pages spread all over its address space. Because these are backed with 2MB Large Pages some 4KB pages may solely take up an entire 2MB backed page in a worst case scenario. This results in almost no free 2MB memory pages in the case of Test 2. However in this test free 4K pages are far more common and as such, ESX does not need to back them with physical memory unless requested.

This test also highlights TPS in action with identical guest OS blocks. At around 8:25 I vMotioned three other Windows 2008 R2 VMs on the same host. As a result you can see the amount of shared memory increase over time (and retrospectively the amount of consumed memory decrease). The magic of TPS 🙂

So your probably wondering where ASLR fits into all this? Well if you haven’t guessed already lets discuss that.

Address Space Layout Randomisation (ASLR)

ASLR is a security feature of modern operating systems to help prevent buffer overflow attacks. It is well explained in this Wikipedia article. For Windows ASLR was introduced with Windows Vista (Server 2008) and has been around ever since. Other operating system such as Linux based OS’s also use ASLR with slightly different implementations.

So how does ASLR effect shared memory?

Because ASLR distributes memory pages all over the address space the chance of finding an unused 2MB block is greatly reduced. This is more obvious and a bigger issue in VMs with larger amounts of Memory.

Test 4 – Windows VM 2 – 10GB RAM with Large Pages Enabled – VM Boot

Windows VM 2 LP Enabled

From the graph above we can deduce a few key points:

  • Straight after boot the VM was already consuming 8GB of RAM. Even though this is a vanilla W2K8R2 OS with no applications installed. According to Perfmon Windows was only consuming 587MB of RAM just after boot.
  • At 9:35 I performed a 6GB Memory allocation using MemAlloc. Although the guest still had around 3.5GB of RAM free the consumed Memory was now 10GB.
  • At 9:41 I de-allocated the Memory. However the consumed memory never decreased back to the original 8GB.

An important point from the above is that after memory was deallocated inside the guest OS, the amount of host consumed memory has not decreased. This is well explained in Understanding Memory Resource Management in VMware® ESX Server Page 5. However it essentially boils down to the fact that ESX can not tell what pages are free after the guest OS has accessed them for the first time. The balloon driver solves this issue, however it is only enacted during times of resource contention due to its overhead and interference with the guest OS.

Test 5 – Linux VM – 6GB RAM with Large Pages Enabled

Linux LP Enabled

This test shows a Linux VM on the same host. As you can see at boot although the VM has 6GB of RAM configured, it is only consuming 1.5GB even with large pages. This shows different implementations of ASLR among different operating systems.

Now onto the final test. In this test we have disabled ASLR inside Windows via this Registry value.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
“MoveImages”=dword:00000000

Test 6 – Windows VM 2 – 10GB RAM with Large Pages Enabled and then Disabled – ASLR Disabled

Windows ASLR Disabled

In this final test the graph above shows the following trends during first boot and test with Large Pages enabled:

  • After boot the VMs consumed memory is far less than in Test 4.
  • During the Memory Alloc consumed memory increased, however it did not reach the granted amount.
  • After the memory was deallocated the consumed memory still remained around 9GB.

During the second boot and test with Large Pages disabled:

  • The VM has very little memory consumed after boot (530MB) which is common for Windows when Large Pages are disabled
  • During the Memory Alloc the consumed memory increased, however it did not reach the granted amount nor was it as high as the last test.
  • After the memory was deallocated TPS starts reducing the amount of consumed memory due to identical pages being found.

How do I disable Large Pages?

Large Pages can be disabled at the ESXi Host Level under Advanced Settings -> Mem -> Mem.AllocGuestLargePage and setting the value to 0. As this is an ESXi Host level setting I would not recommend making this change unless you are happy to trade performance for consolidation. Also note that VMs will need to be vMotioned off and back on the host (or power cycled) for the setting to take effect.

Disable Large Pages

Final Word

As you can see from the tests above the use of Large Pages and ASLR has had a significant effect on the amount of physical memory a guest consumes, as well as its ability to return unused memory back to the ESX host. With these two factors combined (which would be the standard scenario) a guests consumed memory will often far exceed its required amount of memory. This leads to my next post on the importance of VM right-sizing with vCOps and active memory.

Now your probably thinking ‘thanks for the info, I will just go ahead and disable Large Pages and ASLR to save memory!’.
In fact this would be far from my general recommendation of ‘Do not disable Large Pages and/or ASLR unless you have happy with the trade-offs’.
There are obviously performance and security trade-offs for disabling these features that I would generally not recommend in a production environment. However there are certain use cases where this may be preferred. These include:

  • VDI Environments where TCO is important
  • Development Environents
  • Home Lab Environmnets
  • Environments where VMs are grossly oversized and you plan to right size shortly.

In my next post I will discuss how to tune right-sizing with vCOps and well as active vs consumed memory.

Till then Chris Slater out.

 References:

Large Pages – Yellow Bricks

TPS in Hardware MMU Systems KB 1021095

Use of Large Pages can cause Memory to be fully Allocated KB 1021896

ASLR Wikipedia

Understanding Memory Resource Management in ESX

 

Leave a Reply

Your email address will not be published. Required fields are marked *