I’m a huge fan of the Veeam Management Pack for System Center Operations Manager. They’ve created a fantastic product and it is the best management pack for monitoring VMWare infrastructure out there.
I’ve just deployed this recently for a customer, and I came across a monitor alert that kept tripping. The management pack was configured correctly, all vCenter servers were talking correctly and I could see all my discovered objects – however every now and then I would get an alert that looked like this…
Veeam VMWare Collector: VMWare Connection is unavailable
VP050 Error encountered while retrieving performance metrics for a cluster on server01.domain.local. The VMWare API error returned was: ‘Failed to retrieve performance data. Verify the connection settings and account privileges used to connect to the VMWare Server.’
When going into the Virtualization Extensions for System Center web page -> VMWare Servers tab -> Test All Connections – they all tested successfully. I knew that the Run As account I was using is working because I am getting alerts, I’m discovering items and I can drill down into my VMWare hierarchy. So what could be wrong? When looking on my VMWare Collector (in my case it was my SCOM management server), I could see in the Veeam Collector log that Event ID 89 was getting logged (which matched the alert description I was seeing in SCOM).
Here’s the solution:
I contacted Veeam Support and it turns out that due to the size of the VMWare Infrastructure I was monitoring, my management servers were checking in with the vCenter server too often and my vCenter server was being hammered too hard. The fix was to reduce the intervals of how often the Veeam Collectors (my management servers) were talking to vCenter to collect performance data.
To do that…
1) Open Veeam Virtualization Extensions for System Center web interface
2) Go to the Veeam Collectors tab and select Veeam Collectors in the side pane
3) Then on the right action pane click Collection interval (should be at the top in the “Veeam Collectors” pane)
4) Change CollectionInterval from 5 to xx minutes and NetTimeout from 210 to 840 and click Save
5) Then you will be asked to restart all collectors for applying changes.
The “xx” minutes are choices of 10, 15 or 20 minutes. Increase the interval until the alert disappears. In my case I went straight to 20 minutes, but you can try increasing it to 10min, and if you still get the alert in SCOM, then increase it to 15min etc. Note that the higher you increase the threshold, the more you will affect performance monitoring granularity. So it is a trade off.
Once you find the balance, the alert will no longer appear.