Tuning vCOps for your environment – Part 5 – Capacity Management Tuning

In this next post I will finish off discussing Capacity Management and give all my recommended Capacity Management Policy settings for Production Server and VDI environments as well as the rational behind each major policy decision. This post continues on from my last post which covers the problems with capacity management reporting if policies are not set appropriately, as well as the differences between the allocation and demand models. A link to my last post on Capacity Management Models can be found here.

For this post we will focus on Policy Section 3 of a vCOps Policy. These capacity management recommendations can be made to a specific policy that you have created, however they should be generic enough to use in your Default Policy to cover your entire production environment. I will call out when an option might have a different value in a dev/test or VDI type environment and as such a specific policy might be created for that use case.

3a – Capacity and Time Remaining

Spikes and Peaks
One of the first things to check in your policy is making sure the Spikes and Peaks checkbox is checked. By selecting this option the VM Effective Demand is adjusted taking into account stress to size the average VM more towards peaks then just a total average. This is an important setting to check as this box may be unchecked in your default policy depending on vCOps upgrades from previous versions.

SpikesandPeaks

As stated above the Spikes and Peaks checkbox adjusts the Effective Demand of the “Average VM”. The limited demand reflects average demand for resources so you can compare the difference Spikes and Peaks are making to the average calculation. If the Spikes and Peaks checkbox is unchecked this values will be the same.
LimitedvsEffectiveDemand

Q: So you have recommended checking the Spikes and Peaks checkbox to create a more conservative Average VM size. Are there situations where I shouldn’t check this box?
A: Yes there a several situations where you would want to leave this checkbox unchecked. These include Development and VDI environments. VDI is a good example where you would not want this box checked because you would not expect all desktops to be busy at the same time. As a result checking this box in a VDI environment can result in a conservative Average VM size that is too large as as a result you do not reach your target ROI in your VDI environment.

Physical vs. Usable Capacity:
When deciding the “Capacity Remaining based on” you have two options. Physical Capacity or Usable Capacity. This option is used to determine how much “Capacity” an ESXi host provides.
In almost all circumstances I would recommend Usable Capacity.
Q: Why?
A: When working out the capcity of the host rarely would you simply want to use the physical capacity of the host. You need to take buffers into account such as HA, CPU and Memory buffers. Not many customer would want to run an ESXi host to 100% CPU and Memory Usage would they?
This section will also be discussed in more detail in 3b – Usable Capacity.

Demand or Allocation by Compute Resource:
This section has been discussed in detail in my last post. However see below for my summary recommendations on Demand vs. Allocation in Production Server environments.
DemandvsAllocation SummaryRecommendations

3b – Usable Capacity

Now that I have just recommend using Usable Capacity we need to define what Usable Capacity actually is. As you would imagine useable capacity is simply applying buffers and overheads to physical capacity for a variety of reasons.

Reserving resources for HA:
The first box you want to check is Use High Availability configuration, and reduce capacity. This one is important and should nearly always be checked. It is often something people over look when using the cluster summary check method, as they forget that they have setup vSphere HA for N-1 or N-2. As a result you cannot fill all your hosts to 100% as you need to plan for host failure. This checkbox does that planning for you.

Applying Resource Buffers (CPU, Memory, etc..):
With a HA buffer now applied you will also want to add on some CPU and Memory buffers. These buffers are important because as I stated earlier you don’t want to run your host at 100% Memory utilization for example or swapping will start to occur. Here are some reasons to add buffers for certain scenarios:

  • Keeping resources below 90% utilisation (host CPU and Memory)
  • Adding a capacity buffer for unexpected projects (this always happens)
  • Adding a CPU buffer for interactive or peaky (sub-hour spikes) server/VDI workloads (this is particularly importing when using the Demand only model)

UsableCapacity

 

3c – Usage Calculation

Last but not least we have the Usage Calculation screen. The first part of this section requires us to set the working week.

The working Week:
By default the working week is set to “All hours”. This needs to be changed to reflect the business periods of your environment. In most cases you would uncheck Saturday and Sunday and have a 9-5 working day. This step is important as it helps judge the size of the “Average VM” more accurately by not having quite periods skew the results. Some organizations may have busier periods at night rather than during the day, if this is the case simply set the observation window accordingly.

Allocation Overcommit Ratios:
As discussed in 3a – Capacity and Time Remaining Allocation Overcommit Ratios are vital when using the Allocation based model. For CPU for Example these levels effect the vCPU to vCPU target ratios, for Memory they effect they level of target Memory Overcommit.
Q: What should they be set to?
A: Well as stated earlier the CPU level overcommit depend on your organizational policy and your hardware type. This can be anywhere from 1:1 to 10:1.
Memory overcommit is far more straight forward. In a production server environment this should generally be set to 0%.
That’s right 0%,  this is because is most production environments Large Pages are preventing Transparent Page Sharing from providing memory de-duplication benefits.  This is well explained in KB 1021095. As a result we should be generally erring on the side of performance not consolidation and memory overcommitment is becoming more a thing of the past.

UsageCalculation

Final Word:

For my final word on Capacity Management here is a summary of my recommendations:

  • For Production environments make sure the Spikes and Peaks checkbox is checked
  • Use “Usable Capacity” not “Physical Capacity”
  • Use and Demand and Allocation model that works for your environment. After making changes check the Average VM sizing to see how the changes have effected your environment.
  • Ensure the Use High Availability configuration, and reduce capacity checkbox is checked
  • Usable Capacity buffers are important, don’t be afraid to increase the default percentages!
  • Ensure a Work Week is set
  • Set your Allocation Overcommit Ratios appropriately. In most server environments the level of Memory overcommit hould be 0%.
Leave a comment

5 Comments

  1. Wonderful post, colleague!

    Reply
  2. Daniel

     /  May 23, 2014

    Thank you so much for this series of articles! It has helped me understand vCops much better and how to start tuning it to my environment.

    Reply
  3. Jan Erik Holo

     /  June 16, 2014

    Thanks for a useful post.
    Could you come with more info and examples how the relationship between the Usable Capacity Rules settings – Use High Availability (HA) config… and the % of CPU capacity / % of Memory cap.
    Let us say your HA setting is Persentage of cluster resources reserverved.. is 20%
    and vCops Usable Capacity Rule Policy is “Use High Availablility (HA)…” is ticked + “% of CPU capacity…” = 10% + “% of Memory capacity…” = 10%
    Will alarms go off when CPU/Memory load passes 70 % ?

    Reply
  4. Ali

     /  October 22, 2014

    Great write up!!! really helped and i would like to know the answer for question posted by Jan.

    Reply
  5. Mrbsmallz

     /  January 9, 2015

    Excellent post!
    Exactly what I’ve been searching for. Other VCOP materials fall short after discussing installation of the vApp.

    Can’t wait to begin tuning my environment tomorrow.

    A million thanks for this deep dive!!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *