Posts in Category "Database"

XML API Tips: Internal-error When Executing Reporting Calls

Periodically when executing a reporting API call, you may get an unexpected return as shown below:

<results>
<status code=”internal-error”>
<exception>java.sql.SQLNonTransientConnectionException: [Macromedia][SQLServer JDBC Driver][SQLServer]Cannot open database “XXXXXXXXX” requested by the login. The login failed.</exception>

</status>
</results>

Where the ‘XXXXXXXXX’ would be the database name of the database your request was trying to hit.

This is expected if you are making one of the ‘reporting database API calls‘ during the exact time that the db is locked for a small restore.  As previously discussed, the reporting database is not real-time. It is synched occasionally and can be behind by as much as 24 hours.   That error (you would see it in the logs and in your response) is thrown when the DB is being restored.  When the db is being restored, the DB is locked down and the result will be a failed login (internal-error).  The reporting DB is log shipped every 15 minutes.  So every 15 minutes there will be a small restore.  All you’re application needs to do is retry when you get that message.  It could be as much as a minute of downtime, but most of the time is less.

 

XML API Tips: Reporting API Calls and the Reporting Database

One common question from API developers revolves around the existence of our reporting database vs our production database on Adobe’s Hosted platform.  There are a few API calls that will hit the reporting database rather than production, to retrieve information.  This is by design and is to prevent some of the more expensive APIs from being run on a multi-tenant environment’s production database.  The current calls that are redirected to our reporting database and not to our production (real-time) database are:

/api/xml?action=report-bulk-consolidated-transactions
/api/xml?action=report-bulk-objects
/api/xml?action=report-bulk-questions
/api/xml?action=report-bulk-users
/api/xml?action=report-bulk-slide-views

As you can see, these are all the ‘bulk’ API calls.  There is one additional call that is currently (as of Adobe Connect 9.2.2) being directed to the reporting database rather than production, and that is:

/api/xml?action=report-quiz-results

This action will be shifted to the production database in the next major release of Adobe Connect.

The reason this is important is that the reporting database is not real-time like production.  It is delayed, sometimes up to 24 hours.  So it is recommended that if you need to have real-time information in your application, you avoid making the calls above and use other APIs to retrieve the desired data.

DB_PING_TIMEOUT Value Change

Recently we have discovered that a newer setting for on-premise (licensed) Adobe Connect servers may lead to a memory leak on the system in certain rare circumstances.  Here is some history and recommendations in case you believe you may be running into a memory leak problem in your Adobe Connect licensed environment and you are running a version newer than 9.0.3.

The DB_PING_TIMEOUT value was introduced back in Connect 7 (2008 timeframe).  It enables invalidated DB connections to be recognized quickly. In the absence of a reasonable value for this timeout, we have had instances in the past where critical CPS threads (e.g. the scheduler sweeper thread) have waited on a stale DB connection for too long, causing fastfails. This value had since always been set to ’0′ which means there is no time out.  Since the default host health check time out value is 40 seconds, it is recommended that the DB_PING_TIMEOUT default value be set to 30 seconds, so that it is under the limit that causes potential server fast-fails. This was a fairly minor change in the config.ini, where the DB_PING_TIMEOUT value was changed from 0 to 30.  This was done at the Connect 9.0.3 version.  So every version above 9.0.3 will have the default set to 30.  [Important note - this value is in seconds, not milliseconds]

Recent longevity tests in version 9.2 suggested that this might be triggering a memory leak in the driver. The going theory for why that behavior wasn’t seen in previous longevity tests (between 9.0.3 and 9.2) is that we only upgraded to JRE 7 in 9.2. So the setting we were running with previously suddenly seemed to be a problem once we also upgraded to 1.7.

That value of 30 was introduced for a reason, so we don’t suggest turning it off without knowing that it causes a problem. On the Adobe hosted clusters, we have made the decision to do so since there were signs of memory issues even previously and we didn’t want to compound that.

That said, there are known issues with our driver and JRE 1.7, but only under some circumstances. In the case of Adobe Connect system administrators observing  (continuous) increases in heap memory usage, this parameter value should be set back to 0.

This can be done by changing this value either in the config.ini  from 30 to 0 (DB_PING_TIMEOUT=0)  or by adding this value in the custom.ini (it won’t be there by default, but if you add it, it will take precedence over what is in the config.ini)

 

Adobe Connect Database Disaster Recovery Options

Having a good recovery strategy allows for recovery of data in case of unforeseen events such as user error, hardware failure, drone strikes and fecal tsunamis. There are three recovery models:

  • Simple Recovery
  • Bulked Logged Recovery
  • Full Recovery

Simple Recovery is the most rudimentary. When the DB recovery mode is set to simple, the transaction log does not get backed up. It is auto-truncated and you can only ever recover to a full db backup; this builds-in the potential for data loss as a point-in-time recovery is not possible. Generally, the Simple Recovery option is recommended for development or test environments where data recovery is not critical. It is also a good strategy for a novice DBA as you don’t have to worry about a detailed backup and restore plan/jobs. Mission critical databases should never be in simple mode, but for non-mission critical deployments it is a low-overhead alternative.

The Bulk Logged Mode is not very commonly used. When the DB recovery mode is set to Bulk Logged, bulk operations are only minimally logged (Select Into, Create Index, etc.). This results in in reduced log space consumption. The shortfall is that if the last transaction log has bulk operations in it, then point in time recovery is not possible; if it does not have bulk operations in it, then point in time recovery is possible. While it may be prudent to switch full recovery databases temporarily into Bulk Logged Mode for the purpose of re-indexing a very large database, be sure to always switch them back as critical databases probably shouldn’t be in Bulk Logged recovery mode.

Full Recovery Mode is the default recovery model and is the most granular. When the database recovery mode is set to full, everything get’s logged to the Transaction Log resulting in greater log space consumption. Point in time recovery is possible in full recovery mode. This is the recovery model most users should choose for production data. By using this recovery model with regularly scheduled full backups, differential backups and transaction log backups, it allows for quicker point in time recovery.

Choosing a backup and recovery plan is relevant to the following criteria:

  • How important is the Data? The more important the data, the more likely you will choose full recovery and schedule regular full backups, differential backups and log backups.
  • How often does the data change? How busy is the Connect server?
    If the data only changes frequently during normal business hours, scheduling log backups closer together during these times and further apart during non business hours might work out.
  • How much space do you have available for backups? This could determine how many backups will you store and how often will you back up.
  • How quickly do you need to recover data? If recovery speed is not important, but point in time is, you might choose not to do any differential backups and just do Full nightly backups and regular transaction log backups.

Based on the answers to the previous questions, you should be able to determine a backup plan that fits your needs. Remember to test the recovery of your backups regularly.  Backing up is useless if backups are corrupt or not working correctly.

Another important consideration is with the timing of backups. Keep in mind that performing backups is resource intensive.  To help determine an appropriate schedule of your backups, consider the ongoing activities on the Connect servers.

If  you want to focus on recovering data in case of fire or natural disaster then you you should consider storing the backups offsite.  Many savvy DBA’s they keep a predetermined number of current backups on site and also ship the backups offsite (tape or network).  They might choose to keep five current backups onsite and as many as 30 offsite.

SQL 2008 has backup compression allowing you to save on disk space, but it comes with a cost of speed. Choose the compression level that suits your speed of backup. Third-party products offer backup compression as well.

Consider also the various high availability options:

  • SQL clustering relies on Windows clustering. It clusters the entire server not just the database. The fail-over is slower than mirroring and doesn’t provide a fail-over against disk failure.
  • Mirroring (http://msdn.microsoft.com/en-us/library/ms189852.aspx) is a faster fail-over solution. The Connect SQL driver has the ability to choose a fail-over server. This can be done at the DB level.
  • Log Shipping ships completed transactions to the log shipped database; this can be done on the database level and requires manual intervention to fail-over as the log-shipped db is considered a warm DB

Note: Replication is not a recommended option.

Adobe’s Hosted infrastructure uses a hybrid high-availability strategy. We use database mirroring as the primary fail-over solution.It provides faster fail-over and does not have a single point of failure as does clustering which relies on the single disk. We also use log shipping as a secondary fail-over solution. In the extreme case that all mirrored databases go down, the log shipped database can be used with some user intervention: Break the log shipping, take the database out of standby mode and point the Connect server to it.

Adobe Connect Database Performance and Monitoring

Following SQL database performance best practices and monitoring the health of you Connect database will help to insure a responsive Connect server providing excellent end user experience.

It is best to always place the operating system, data and log directories on separate disk drives; this will result in improved performance. If you must put Connect on the same server as the DB (never a best practice but sometimes a practical necessity), you should ensure that the Connect installation and content directories are on a different disk drive than is the database data. The Temp DB should also be on a separate disk drive. Putting the SQL data on striped disks,  provides a tuning benefit as well.

Be sure to aggressively re-index and update statistics. De-fragment the operating system data and log files on a regular schedule. Ensure that there is minimal latency between the Connect server and the SQL Server. Be wary of  network maintenance and backups that can produce latency between the Connect server and the SQL server and be sure to avoid heavy Connect use during any such maintenance.

Make sure that the SQL server has plenty of RAM; the more RAM the better.  Everything works much faster in memory.  The more of the database that you can keep in memory the better off you will be. Only virtualize the DB server if absolutely required.While Connect runs fine on supported VMWare servers, the SQL database server is best run on a dedicated platform

With reference to the use of separate disks, here is a prioritized list of what should have its own disk:

  1. Operating System
  2. Adobe Connect (Separate Server if possible)
  3. SQL database
  4. Data
  5. Log
  6. TempDB
  7. Transaction logs

For best performance, set the initial size of the transaction log file based on estimated use.  This avoids unnecessary fragmentation. The transaction log should be on a different drive than is your data file, temp database and operating system. Manually shrink the transaction log files based on monitoring.  If you try to do this as a nightly or weekly job, you will end up with unnecessary fragmentation. De-fragment the transaction log file as necessary and consider putting transaction logs on striped disks. Ensure regular backups as transaction log backups empty the space inside the log file and prevent it from continuing to grow.

Manage the memory by setting the minimum server memory for SQL server.  Remember to leave enough for the operating system and any other applications running on the server:

db3.fw

SQL Server uses the tempdb database as a working area for temporary tables, sorting, sub-queries etc.; the tempdb should be stored on its own drive away from other DBs whenever possible.  The default location is on the SQL install disk. Increase the size of the tempdb database based on expected usage and available space. SQL Server automatically adjusts the size over time, but each change causes a performance hit and causes fragmentation. By increasing the size, you avoid constant growth. SQL 2008 uses the tempdb more than prior versions of SQL. Never try to backup the tempdb.

Monitor the disk space of the data files and log files. Disk space is inexpensive when compared with the benefits it provides when available in abundance.  You should aim to keep at least 30% free disk space in case you need to expand the data/log files, or if they are set to autogrow.  Sudden increases in size should  be cause for investigation.

Monitor the fragmentation levels. If the database and log were set to autogrow at small intervals, there is a high likelihood that they are fragmented. If you regularly shrink the DB data files or log files, that could also lead to fragmentation

Monitor for slow queries; you can see slow queries in the Connect debug log.  Just search for Slow Query. Query times are returned in msec. Also look for lock timeouts in the debug log.  Generally this is a sign of database problems. A lock timeout is a query that attempts to get a lock on a database resource.  It times out because something else is already holding a lock. A lock is usually held until the transaction has completed, so if there is a long running query it could cause lock timeouts. You can also run traces against the database to gather information on long running queries.  In SQL 2008 you can query dynamic management views to get this information.

Monitor indexes liberally keeping in mind that re-indexing regularly should decrease the need to monitor indexes. Sometimes re-indexing may start taking too long to complete and you will want to be more selective about what to target. Knowing which tables or indices are most fragmented allows you to only re-index them. You can query dynamic management views in SQL Server to get this information (see SQL Server books online). Many 3rd party products offer monitoring of SQL server and you might consider these products if you want a more automated GUI interface to monitoring indexes. Some of the products offer monitoring for other areas of SQL Server as well.

Windows performance monitor or perfmon is useful; you can use perfmon to monitor SQL counters.  Here are 3 common counters which, if they reveal something will warrant further scrutiny.

  • Pages/sec  -  How much your SQL server is paging in and out of memory
  • Disk Queues -  If the write or read disk queues are too high, you will need more RAM
  • CPU Queue length -  If the CPU queue is consistently over 2 per CPU for an extended period of time, you might have a CPU bottleneck.

Be aware of  load and activity when monitoring with perfmon as database backups and other maintenance activities can cause spikes in these numbers. It is best to connect to the server from a different PC if you intend to monitor it with perfmon.

A good maintenance plan will include scheduled re indexing during off hours. Fragmented indices can cause Connect to become sluggish and might even cause fast-fails in a Connect cluster. If you start to see a lot of slow queries in the debug log, you should ensure that the Connect DB is being re-indexed regularly: Index maintenance is one of the easiest ways to keep your DB healthy and SQL Server provides wizards that help make index maintenance easy.

Open SQL Server Management Studio and open the management folder.

  • Right Click on the Maintenance Folder
  • Choose Maintenance Wizard

Give the Maintenance plan a name:

db4.fw

Choose the desired maintenance tasks: Rebuild Index & Update Statistics

db5.fw

Choose the Database you want to re-index:

db6.fw

Reorganize with the default amount of free space; the default amount is what it was initially created with.

db7.fw

Choose the same database to update statistics after you re-index.

db8.fw

Schedule a job to run the maintenance plan; provide a name and choose a schedule that suits your infrastructure:

db9.fw

Database performance and monitoring best practices will insure a responsive Connect server providing excellent end user experience.

FAQs on Adobe Connect SQL Database Installation, Startup, Connection and Pooling

The following is a summary of Adobe Connect 9 database installation tips

1. What do I need to start?

Always check the updated system requirements page prior to installing: http://www.adobe.com/products/adobeconnect/tech-specs.html
As of the writing of this article it reads: Microsoft SQL Server 2008 SP3, 2008 R2

While it is best to have sa permissions, you are required to use a username and password with dbcreator privileges.  We highly recommend recommend using an sa account. After the install you may use a dbo account for normal use, but during any upgrade or updater application, you must switch back to sa.

2. When does the installer create the database for Connect?

All current Connect versions (after 7.5SP1) create the database during installation. Typically the DB creation process takes about 50 seconds. First the schema get created and then the seed data are inserted. After the DB is created, Connect is still not fully functional until you download and apply the license.txt file. The license file will insert additional seed data into the Connect database including templates and folders.

3. How should I troubleshooting database login failures during installation?

db1.fw

This error can mean several things:

  • The username incorrect
  • The password could be incorrect
  • SQL Server Authentication might not be on.

Entries in the debug.log will provide some answers:

db2.fw

  • java.sql.SQLException… Login failed for user ‘sa’ usually means it is a mistype in the username or password
  • java.sql.SQLException… Login failed for user ‘sa’. The user is not associated with a trusted SQL Server connection usually indicates SQL Authentication is disabled
  • java.sql.SQLException…Cannot open database “dbname” requested by the login,  usually indicates that the login exists, but does not have permission to open the DB
  • java.sql.SQLException…CREATE TABLE permission denied in database ‘dbname, this usually indicates the login has permission to login to the DB, but does not have permission to create schema objects.

Note: During install and upgrade and during minor updates of point releases, the DB user must have permissions create, alter or drop schema objects.

Note also that log errors are discussed on page 83 of the Adobe Connect Installation Guide: http://help.adobe.com/en_US/connect/9.0/installconfigure/connect_9_install.pdf

If you encounter any of these errors, stop all of the Connect services, correct the user privileges in SQL and start the services again.

4. What happens during a successful startup?

During start-up, Connect tries to login to the SQL database, if it can’t connect, the service stays running but enters into a dormant state. You will be able to gain access to local port 8510 to configure the Connect server through its wizard, but  not the application front end. If it the connection is successful then Connect
makes multiple connections to the SQL database (connection pool). The initial connection pool and max connection pool is configurable. Connect checks the DB Version and determines if it needs to apply updates and then the Connect Host updates a row in the DB (PPS_ENUM_DATA_HOSTS) and sets itself active.

5. How does Connect monitor the health of the SQL database? What is the HealthCheck function for?

Connect relies heavily on the SQL database. it is safe to call the SQL database the heart of any Adobe Connect installation. Connect constantly checks to see if there is a valid connection to the SQL database. Loss of connection can lead to data corruption. To avoid this, Connect runs a health-check on the SQL database; it pings the SQL Server and checks to see if it has been more than 40 seconds since the Connect Server has updated the PPS_ENUM_DATA_HOSTS table. If it is greater than 40 seconds, the Connect Pro Host is marked inactive and the services for that Connect server will restart and then reattempt  to connect to the SQL database.

If you are running the Connect SQL database in a SQL cluster rather than in a mirrored environment, you will want to make sure that Connect makes multiple database connection attempts during SQL fail-over. If Connect loses its SQL database, the entire Connect cluster will go down and it will wait for an administrator to manually reconnect to the database through launching the Connect configuration console on port 8510. Add the following to the custom.ini file to support any delays in clustered SQL fail-over:

DB_URL_CONNECTION_RETRY_COUNT = 15
DB_URL_CONNECTION_RETRY_DELAY= 30

The actual JDBC string is in the config.ini file so you do not need to put it into the custom.ini; double check the config.ini if you are running into any problems with the JDBC reconnection string:

DB_URL=jdbc:macromedia:sqlserver://{DB_HOST}:{DB_PORT};databaseName={DB_NAME};user={DB_USER};password={DB_PASSWORD};ConnectionRetryCount={DB_URL_CONNECTION_RETRY_COUNT};ConnectionRetryDelay={DB_URL_CONNECTION_RETRY_DELAY}

6. What is the purpose of the Connection Pool and why do it the way we do?

Adobe Connect makes use of a connection pool. Every time the Application needs to communicate with the SQL database, it checks for the next available idle connection and uses it. If there isn’t one available, it will create a new connection unless it has reached the connection pool max. Once the application has finished it’s transaction, it releases the connection back into the pool. These settings are found in \appserv\conf\Catalina\localhost\root.xml

  •               minPoolSize=”20″
  •               maxPoolSize=”25″
  •               initialPoolSize=”20″

This prevents the overhead of creating new connections each time a call to the SQL database is required. The connections are made at start-up. Since Connect relies heavily on the DB, having available connections is essential.

Using a Named Instance of SQL Server with Adobe Connect

Issue: When using  a named instance of SQL server with Adobe Connect, if you enter the name of the instance during installation, the connection to the database will fail.

Workaround: Instead of using the name of the instance, you may enable TCPIP on the instance and use the IP address and then enter the port number for which the named instance is configured on the separate line as appropriate; if the named instance is listening on port 1833 (instead of 1433), then you would use the IPAddress (192.168.1.1)  and then the port number (1833) in the appropriate fields.

Using the instance name during installation after Connect version 8 will not work. The best approach is to use the port number of the instance and the IP Address of the database server.

To troubleshoot this, use SQL Server Configuration Manager:

Screen Shot 2014-02-21 at 7.59.05 AM

  • Make sure the instance has TCP/IP enabled.
  • Check to see what port the instance is listening on for that IP.
  • Use the IP address as server name (no instance name).  Put the port number in the port number field.
  • Make sure the named instance is listening on the port entered in Connect

Screen Shot 2014-02-21 at 7.59.41 AM

Screen Shot 2014-02-21 at 8.00.24 AM

Beginning with Connect version 8, the installer changed; in previous earlier versions, you would need to enter the the server instance and port on the same line.  The  newer installer has the port on a separate line:

HOST: INSTANCE-IP-ADDRESS
PORT: INSTANCE PORT-NUMBER
DATABASE NAME: SAMPLE-NAME
USER: SAMPLE-USER
PASSWORD: ****************
CONFIRM PASSWORD: ****************

The changes in the installer (beginning with Connect version 8) caused some confusion with named instances.

A named instance will work on an initial installation.  Sometimes, in an effort to troubleshoot you may initially point to a conventional instance of SQL in order to establish the installation and then point the established Connect installation to a named instance. The DB connection from an established Connect installation is more robust and forgiving than that of an initial installation.  After the installation is complete, you can modify custom.ini to include the instance name.

Note: You could use the server name, but you would need to ensure that the named instance has NAMED PIPES enabled.

Configuring Adobe Connect to take Advantage of Database Mirroring

Full redundancy requires that the Connect database be either mirrored or clustered; Adobe uses mirroring as the preferred solution.

The following example settings in the custom.ini file are needed to configure Connect to take advantage of SQL Mirroring:

DB_NAME=ConnectDBName

DB_HOST=ConnectDBPrimaryHostName

DB_BACKUP_HOST=ConnectDBSecondaryHostName

DB_URL=jdbc:macromedia:sqlserver://{DB_HOST}:{DB_PORT};databaseName={DB_NAME};user={DB_USER};password={DB_PASSWORD};AlternateServers=({DB_BACKUP_HOST}:{DB_PORT};DatabaseName={DB_NAME});ConnectionRetryCount=12;ConnectionRetryDelay=10;FailoverMode=extended;FailoverPreconnect=false;FailoverGranularity=atomic

Note: Change the first three variables as appropriate, but do not make any changes to the DB_URL.  It is all one line and it pulls the values from the other three entries in custom.ini:

The follwoing setting is always pudent whether using mirroring or clustering, but it is particularly important if you are clustering SQL. If you are running the Connect SQL database in a SQL cluster rather than in a mirrored environment, you will want to make sure that Connect makes multiple database connection attempts during SQL fail-over. If Connect loses its SQL database, the entire Connect cluster will go down and it will wait for an administrator to manually reconnect to the database through launching the Connect configuration console on port 8510. Add the following to the custom.ini file to support any delays in clustered SQL fail-over:

DB_URL_CONNECTION_RETRY_COUNT = 15
DB_URL_CONNECTION_RETRY_DELAY= 30

The actual JDBC string That invokes these variables is in the config.ini file:

DB_URL=jdbc:macromedia:sqlserver://{DB_HOST}:{DB_PORT};databaseName={DB_NAME};user={DB_USER};password={DB_PASSWORD};ConnectionRetryCount={DB_URL_CONNECTION_RETRY_COUNT};ConnectionRetryDelay={DB_URL_CONNECTION_RETRY_DELAY}

Save the custom.ini and cycle the services.

 

 

Adobe Connect Server Licensing for Disaster Recovery

This question is commonly asked: Does my license for On-Premise Adobe Connect allow me install Adobe Connect servers for disaster recovery purposes?

First let’s define the terms: Disaster Recovery Environment refers to your technical environment designed solely to allow you to respond to an interruption in service due to an event beyond your control that creates an inability on your part to provide critical business functions for a material period of time. That is to say, it refers to a secondary site that would not be utilized in production unless the primary site went offline due to a natural or human-inflicted disaster that is beyond your control. Use of Adobe Connect servers in Disaster Recovery Environments is within the scope of your license and no additional fees are due to Adobe Systems Incorporated. For example, for the architecture depicted here, you would need four Adobe Connect server licenses. 

 

Connect_DR_cluster

 

However, adding one or more Adobe Connect servers to a local cluster is outside the scope of your license, and you will need to purchase additional licenses from Adobe Systems Incorporated to accomplish this.  Additional licenses are needed when adding any Adobe Connect servers that increase scalability in the form of:

  • Availability — What percentage of time is Connect available to geographically distributed users?
  • Reliability — How often does Connect experience problems that affect availability?
  • Performance — How fast does Connect consistently and qualitatively respond to user requests?
  • Concurrency – How many users can a Connect deployment handle concurrently?

Information around cluster expansion is here: Adobe® Connect™ server pools/clusters and hardware-based load-balancing devices with SSL acceleration

If you were to geographically distribute an active Connect cluster by placing Adobe Connect servers into two separate data centers, that would also require additional licensing. Connect servers in a cluster cannot have more than 2-3ms of latency between and among Connect servers.  Generally you would not geographically distribute Adobe Connect servers into different data centers, however, there is a chapter in the aforementioned clustering article on the topic. With that said, the architecture depicted below, is an example of a distributed active Adobe Connect cluster that is is spread between two local data-centers with nominal latency between those data-centers (less than 3ms of latency). All four servers are in production and all are actively hosting meetings and serving on-demand content.  This Connect architecture example depicted in the diagram below requires a four-server Connect cluster license:

 

Cross-DC-CLUSTER

 

Connect on VMWare – some deployment tips

Issue: VMWare is ubiquitous in the enterprise and while it opens up huge potential for management of the Connect infrastructure, it must be planned and executed with an eye toward robustness.

This advice is gleaned from conversations with senior persons on our operations team as well as from support cases generated by various customers with on-premise VMWare deployments of Connect.

One of the most important and often overlooked variables about virtualization is to make certain that  VMware is compatible with all the underlying components of the server and network architecture. The infrastructure supporting VMWare must be verified by VMware under their Hardware Certification Program or Partner Verified and Supported Products (PSVP) program; be sure to use certified hardware.

Here is the link to the compatibility reference:  http://www.vmware.com/resources/compatibility

With Connect you must consider both Tomcat and  FMS; the former can run on most anything, while the latter is a bit more demanding; RTMP can be acutely;y affected by latency and packet transmissions. If you notice unpredicted latency or a surprise crash of FMS with Connect 9.1, a good test would be to check the network components; sniff for packet transmission issues – have the vNIC of the guest VMs configured to use VMXNET3; this is a good place to start.

With reference to recommendations and best practices, it really depends on the VMware infrastructure adopted. The following references serve as a guide for an enhanced environment:

Enterprise Java Applications on VMware – Best Practices Guide: http://www.vmware.com/resources/techresources/1087

Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs: https://www.vmware.com/resources/techresources/10220

Performance Best Practices for VMware vSphere 5.1: https://www.vmware.com/resources/techresources/10329

The key with Network Storage is speed. If you lose connectivity to the shared storage then only what is cached on the origins will be available.

Shared storage requirements

  • Disk specs: 10,000–15,000 RPM — Fibre Channel preferred
  • Network link: TCP/IP — 1GB I/O throughput or better
  • Controller: Dual controllers with Active/Active multipatch capability
  • Protocol: CIFS or equivalent

Avoid, virtualizing the Connect database if possible.

I have seen that in some customer-based VMWare environments that are overtaxed, that latency among the servers on 8507 (and 8506), can cause problems. Intra-cluster latency (server to server communication) should never exceed 2-3ms. When it does we see intermittent crashes. I had one customer who had a particularly weak infrastructure and for whom I could predict his crashes; he was doing back-ups and running other tasks at a certain time weekly that would tax and hamper network connectivity for about an hour; these tasks were so all-consuming on the network, they turned every cluster resource into an individual asset on its own island. The log traces bore this out and we knew with precision what was going on. He knew he needed to upgrade his infrastructure and in the meantime we worked out a reaction plan to deal with the issue; it included:

  1. Place a higher than normal percentage of cache on each server to limit invoking shared storage
  2. Set the JDBC driver reconnection string for Database connectivity
  3. Plan Connect usage around these maintenance activities and when possible, do Connect maintenance activities at the same time as well – not very difficult as these were after hours, but being a  global operation, still not a given.