Tuning vCOps for your environment – Part 4 – Capacity Management Models

In the next part of the DefinedBySoftware vCenter Operations Manger series we will be going through the complicated but important topic of Capacity Management of vSphere environments. This part of the series will focus on Capacity Management theory for vCOps, with the next post containing my recommended policy settings for accurate capacity management reports.

First the problem….
One of the main features of vCOps is its ability to assist with capacity management of your virtual infrastructure. This is of great benefit to the virtual infrastructure admin, as in my opinion capacity management is something that a lot of organizations do poorly. However one of the main issues I see customers face with vCOps is that the capacity management polices are not configured appropriately for their environment and misleading capacity reports are given, and therefore the feature is ignored.

So Chris can you sum up this problem in picture format?
Yes this can be easily depicted in the mighty slatchlab. As you can see from the picture below I have capacity remaining for 44 more Virtual Machines in this cluster. So whats wrong with that???
CapReminingIssue

There are may traditional ways of managing capacity in a vSphere environment such as resource tracking spreadsheets, traditional external capacity management tools and my favorite the cluster summary check.
The cluster summary check is a capacity check that some vSphere admins use when their manager asks how much capacity is left is a cluster. It goes something like this.

Manager: “Hey Chris how much capacity is left is the slatchprod cluster?”
Admin: “Let me check”.
Inside Admin’s head: “Ok hosts are around 55% memory usage and 3% CPU usage. There are 11 powered on VMs, so about another 8 or so should be the max”
Admin: “We are at around 55% so another 8 VMs and we will be out of resources.”
Manager: “Well vCOps reports I have 44 VMs left so what is going on?”
Admin: “Let me get back to you”
ClusterCheck

It is obvious that that the cluster summary check is flawed (for many reasons), however you can also see that the vCOps remaining capacity estimate seems very optimistic as well. Now in this case I adjusted the capacity policy to give more of a worst case, however it shows the important of tuning vCOps to give the right data.

The solution:
The solution is easier said that done. Change the vCOps Policy to reflect your environment taking into account some of my basic recommendations. After which you get a report like the one below showing that only 2.3 VMs worth of capacity remain which is far more realistic in my small environment.

CapRemainingResolved

So how did I do it?
In my next post I will detail my recommended capacity management policy settings for production server environments. However before I just blurt out the answer I need to discuss how I came up with the policy. This will help you come up with your own policy for your environment.

Demand vs. Allocation Models

Before I get into all the vCOps Capacity Management settings the Demand vs. Allocation models need to be discussed as it has such a massive impact in determining the size of the “Average VM” which is used for capacity planning. Below is a screenshot from a vCOps Policy where all this discussion is relevant (3a Capacity and Time remaining). The demand vs allocation options give a sea of choices so lets go through which boxes are the right ones to check?
vCOpsSettingDemandvsAllocation

Before we go into the individual infrastructure items (CPU, Memory, Desk I/O, etc…)  lets discuss Demand vs Allocation over all. Thanks to Ben Todd for this great slide below.

AllocationvsDemand

The image above gives a great list of pro’s and con’s for allocation vs demand, and although using both is appropriate in some cases, in others it may not be. Lets use Container CPU for example (containers are the most relevant column as it effects an ESXi Cluster which is the object that is selected commonly for capacity management).

Demand:
CPU Demand is a derived metric that is made up of multiple sub metrics (in the case CPU usage, CPU Ready, etc..) it is used to estimate the amount of CPU an object actually wants to consume. Although demand and usage are often identical it is possible to demand to exceed usage, this would indicate resource contention. Demand is useful way to manage capacity with as a Virtual Machine will rarely use all the CPU has been configured with which is the basic principle of overcommitment. You will also find that demand usually matches the Usage % metric that is observed inside vCenter.
ClusterCPUDemand

Allocation:
Q: So if Demand is so great for CPU why use allocation at all?
A: There may be situations where you want to control the vCPU to pCPU ratio on your Clusters.
So if you haven’t guessed already the Allocation model in CPU effects the amount of vCPUs that can be allocated to pCPU’s. It is important to note that the vCPU to pCPU ratio is set in section 3c Usage Calculations. Failure to set this setting to the correctly can lead to over optimistic or conservative capacity estimates. However there may be situations where you would want to manage CPU capacity by allocation and this model would be preferred. For example a Business Critical Applications cluster where you want to ensure a 1:1 ratio of vCPU to pCPU for performance.
AllocationOvercommitRatios
Q: So what should my CPU Allocation Overcommitment Ratio be set to?
A: Well that depend on your organizational policy, CPU type and speed, types of applications, etc…
In short it is often hard to set this value for production environments. So if you are in doubt what the ratio should be ensure that the CPU Container Allocation model is unchecked and simply rely on CPU Demand. Now you may be thinking “But what about workload spikes and a safety buffer?” That will be discussed in the next post so relax.

What about Memory?
So I have discussed CPU models but what about Memory, should that be using Demand as well?
The short answer is Not usually.
Memory Demand
is based on a variety of metrics, however the main metric is Active Memory. Active Memory is often far lower than Consumed Memory. This is due to a variety of factors and a great explanation of Active Memory can be found here. This can be mostly solved by ‘right-sizing’ VM’s however this is easier said that done. Therefore when capacity planning by Memory demand the result might be over-optimistic and not suitable for production environments in world of Large Memory pages and Transparent Page sharing only taking effect at 94% Host memory utilization. I will release another blog post on how we use Memory Demand for VM right sizing after applying some additional tuning.
So for my recommendation ensure that for Memory you use the Allocation Model and set the overallocation appropriately as would be done for CPU.

Ok that’s great what about Disk Space, Disk I/O and Network I/O?
Simply I would say disable these all together, however use your judgement.
Disk space usually does not work as a capacity management metric because Datastores and LUNs are created on Demand by your SAN administrator (unless you pre-present all your Storage in advance). As such vCOps doesn’t know about how much capacity the actual SAN has left and this resource will often be the most constraining if left enabled.

Disk I/O and Network I/O can be left enabled, however I rarely find these are constraining factors when determining how many VMs to place on a cluster. Once again these are resources for which performance or capacity is externally managed and is usually not the main focus of vSphere Cluster capacity management.

That’s all for now for now folks. In my next post I will go through all my capacity management policy recommendations (with the exclusion of Demand vs Allocation Model as this was just covered).

Leave a comment

2 Comments

  1. One of the Best Explanation of vCOps Capacity Management Policies.

    Reply
  2. Claudio Westerik

     /  May 8, 2014

    Great article!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *