Tuning vCOps for your environment – Part 1 – Alert Sprawl

As my first post to DefinedBySoftware.com I thought I would post a multi-part series on tuning and using vCenter Operations Manager aka vCOps.

vCOps in my personal opinion in a fantastic health and capacity management tool for monitoring virtualised environments. vCOps collects and analyses information from multiple data sources and uses advanced analytics algorithms to learn and recognise the normal behaviour of every resource it monitors. It also provides capacity planning and reporting as well as right sizing recommendations for undersized and oversized VMs.

Although this tool may sound too good to be true, it is often let down in my experience by limited understanding and not being tuned for a customers own environment.  As such the goal of these next few posts will be to provide some quick tips and advanced vCOps tuning knowledge that can help with some common problems such as alert spam and overoptimistic capacity reports.

Alert Management ->

One piece of feedback I often receive with vCOps is that after deployment to many alerts are active in both the vSphere UI and the Custom UI. Such a high number of alerts can be daunting and as such they are often all ignored. Badge alerts usually make up the majority of alerts for example Workload, Capacity Remaining and Time Remaining. One of the main reasons for this is badge alerts can not be simply cleared, the badge level state needs to be adjusted in the appropriate policy. We will discuss this is a later post as policy tuning is the heart of vCOps Tuning and how that policy is then applied to the appropriate groups and objects.

For now lets discuss some quick wins that can be made to reduce the number of active alerts.
Alerts are generated when a badge changes from a healthy state (green) to a lower state (yellow, amber or red) or a fault is generated. One thing that many people would notice is that OOTB many alerts are generated for Time and Capacity Remaining. These Risk alerts (Capacity Management) often fill the Alerts Window with hundreds or thousands of warnings of a particular resource running out (we will cover capacity management and tuning in a later post).
One of my main quick wins is to disable the Time Remaining and Capacity Remaining alerts all together in the Default Policy. The rational behind this is simple, capacity management is a task I perform daily or weekly as a scheduled activity and I do not need to be alerted on it. Leave the alerts to things I need to focus on now, not in 1 months time.

As such I have provided below my recommendations for a vCOps Policy both default and custom made under the Configure Alerts section. As you can see Time Remaining and Capacity Remaining have been unchecked. It is also important to note that unchecking an alert does not prevent the badges from degrading to lower states and therefore lowering your Cap or Time remaining score on objects. It simply stops that object from generating an alert related to that badge (we will discuss badge tuning in a later post).

Default Policy Alert Recommendations
  Infrastructure
Objects
VMs Groups
Workload Checked Checked Checked
Anomalies Checked Checked Checked
Time Remaining      
Capacity Remaining      
Stress Checked Checked Checked
Waste      
Density      
Faults Checked Checked Checked

You will notice above that Workload, Anomalies and Faults have been left checked. As these minor badges directly effect the operational health of an object and should be alerted on as this means attention should be paid on this object now. I have also left stress enabled, however I see this as optional depending on how tuned the stress polices are for your environment.

Stay tuned for future posts on Policy configuration, Capacity Management configuration, Intelligent Group creation and much more.

Leave a comment

3 Comments

  1. Very nice write-up… Looking forward to the next articles in the series 🙂

    Reply
  2. Gordon grant

     /  September 18, 2014

    Excellent write up. This helped me lots!

    Reply
  1. Tech Blast #04 - Using the Force | Wahl Network

Leave a Reply

Your email address will not be published. Required fields are marked *