Issue: VMWare is ubiquitous in the enterprise and while it opens up huge potential for management of the Connect infrastructure, it must be planned and executed with an eye toward robustness.
This advice is gleaned from conversations with senior persons on our operations team as well as from support cases generated by various customers with on-premise VMWare deployments of Connect.
One of the most important and often overlooked variables about virtualization is to make certain that VMware is compatible with all the underlying components of the server and network architecture. The infrastructure supporting VMWare must be verified by VMware under their Hardware Certification Program or Partner Verified and Supported Products (PSVP) program; be sure to use certified hardware.
Here is the link to the compatibility reference: http://www.vmware.com/resources/compatibility
With Connect you must consider both Tomcat and FMS; the former can run on most anything, while the latter is a bit more demanding; RTMP can be acutely;y affected by latency and packet transmissions. If you notice unpredicted latency or a surprise crash of FMS with Connect 9.1, a good test would be to check the network components; sniff for packet transmission issues – have the vNIC of the guest VMs configured to use VMXNET3; this is a good place to start.
With reference to recommendations and best practices, it really depends on the VMware infrastructure adopted. The following references serve as a guide for an enhanced environment:
Enterprise Java Applications on VMware – Best Practices Guide: http://www.vmware.com/resources/techresources/1087
Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs: https://www.vmware.com/resources/techresources/10220
Performance Best Practices for VMware vSphere 5.1: https://www.vmware.com/resources/techresources/10329
The key with Network Storage is speed. If you lose connectivity to the shared storage then only what is cached on the origins will be available.
Shared storage requirements
- Disk specs: 10,000–15,000 RPM — Fibre Channel preferred
- Network link: TCP/IP — 1GB I/O throughput or better
- Controller: Dual controllers with Active/Active multipatch capability
- Protocol: CIFS or equivalent
Avoid, virtualizing the Connect database if possible.
I have seen that in some customer-based VMWare environments that are overtaxed, that latency among the servers on 8507 (and 8506), can cause problems. Intra-cluster latency (server to server communication) should never exceed 2-3ms. When it does we see intermittent crashes. I had one customer who had a particularly weak infrastructure and for whom I could predict his crashes; he was doing back-ups and running other tasks at a certain time weekly that would tax and hamper network connectivity for about an hour; these tasks were so all-consuming on the network, they turned every cluster resource into an individual asset on its own island. The log traces bore this out and we knew with precision what was going on. He knew he needed to upgrade his infrastructure and in the meantime we worked out a reaction plan to deal with the issue; it included:
- Place a higher than normal percentage of cache on each server to limit invoking shared storage
- Set the JDBC driver reconnection string for Database connectivity
- Plan Connect usage around these maintenance activities and when possible, do Connect maintenance activities at the same time as well – not very difficult as these were after hours, but being a global operation, still not a given.