Posts in Category "Clustering"

Configuring Adobe Connect to take Advantage of Database Mirroring

Full redundancy requires that the Connect database be either mirrored or clustered; Adobe uses mirroring as the preferred solution.

The following example settings in the custom.ini file are needed to configure Connect to take advantage of SQL Mirroring:

DB_NAME=ConnectDBName

DB_HOST=ConnectDBPrimaryHostName

DB_BACKUP_HOST=ConnectDBSecondaryHostName

DB_URL=jdbc:macromedia:sqlserver://{DB_HOST}:{DB_PORT};databaseName={DB_NAME};user={DB_USER};password={DB_PASSWORD};AlternateServers=({DB_BACKUP_HOST}:{DB_PORT};DatabaseName={DB_NAME});ConnectionRetryCount=12;ConnectionRetryDelay=10;FailoverMode=extended;FailoverPreconnect=false;FailoverGranularity=atomic

Note: Change the first three variables as appropriate, but do not make any changes to the DB_URL.  It is all one line and it pulls the values from the other three entries in custom.ini:

The follwoing setting is always pudent whether using mirroring or clustering, but it is particularly important if you are clustering SQL. If you are running the Connect SQL database in a SQL cluster rather than in a mirrored environment, you will want to make sure that Connect makes multiple database connection attempts during SQL fail-over. If Connect loses its SQL database, the entire Connect cluster will go down and it will wait for an administrator to manually reconnect to the database through launching the Connect configuration console on port 8510. Add the following to the custom.ini file to support any delays in clustered SQL fail-over:

DB_URL_CONNECTION_RETRY_COUNT = 15
DB_URL_CONNECTION_RETRY_DELAY= 30

The actual JDBC string That invokes these variables is in the config.ini file:

DB_URL=jdbc:macromedia:sqlserver://{DB_HOST}:{DB_PORT};databaseName={DB_NAME};user={DB_USER};password={DB_PASSWORD};ConnectionRetryCount={DB_URL_CONNECTION_RETRY_COUNT};ConnectionRetryDelay={DB_URL_CONNECTION_RETRY_DELAY}

Save the custom.ini and cycle the services.

 

 

Improving the Performance of Connect Meeting Clustered Fail-over

To improve the performance of Connect Meeting fail-over in a cluster, make the following changes to the custom.ini file and cycle the FMS and Connect services or reboot the connect servers:

The installation documentation for clustering originally prescribed setting the MEETING_TIMEOUT variable to 0 as shown:

#For Connect 8&9 Cluster fail-over add the following
MEETING_TIMEOUT=0
MEETING_CONNECT_BACK_TIME=1

We have discovered that in versions of Connect 9 prior to 9.2, setting the MEETING_TIMEOUT variable to 1 insures that a meeting will only have one active session while setting the MEETING_TIMEOUT variable to 0 opens the possibility (although rare) of a meeting having more than one active session. The result is a bit odd. Everyone could be happily in an ongoing Connect meeting while one attendee is alone in what appears to be the same meeting. This is usually triggered by an some form interruption in the isolated users meeting session such as from a brief outage in the network connection of the isolated user.

The recommended custom.ini setting for Connect 9 servers prior to version 9.2 looks as follows:

#For Connect 9 Cluster fail-over add the following
MEETING_TIMEOUT=1
MEETING_CONNECT_BACK_TIME=1

Save the custom.ini after making the change and cycle the Flash Management Sever and Adobe Connect Server services.

In Connect version 9.2, the meeting timeout session may be set back to 0 as we have made the appropriate code changes to insure the functionality works as originally intended even if a user experiences network interruptions and must reconnect to the meeting.

Adobe Connect Server Licensing for Disaster Recovery

This question is commonly asked: Does my license for On-Premise Adobe Connect allow me install Adobe Connect servers for disaster recovery purposes?

First let’s define the terms: Disaster Recovery Environment refers to your technical environment designed solely to allow you to respond to an interruption in service due to an event beyond your control that creates an inability on your part to provide critical business functions for a material period of time. That is to say, it refers to a secondary site that would not be utilized in production unless the primary site went offline due to a natural or human-inflicted disaster that is beyond your control. Use of Adobe Connect servers in Disaster Recovery Environments is within the scope of your license and no additional fees are due to Adobe Systems Incorporated. For example, for the architecture depicted here, you would need four Adobe Connect server licenses. 

 

Connect_DR_cluster

 

However, adding one or more Adobe Connect servers to a local cluster is outside the scope of your license, and you will need to purchase additional licenses from Adobe Systems Incorporated to accomplish this.  Additional licenses are needed when adding any Adobe Connect servers that increase scalability in the form of:

  • Availability — What percentage of time is Connect available to geographically distributed users?
  • Reliability — How often does Connect experience problems that affect availability?
  • Performance — How fast does Connect consistently and qualitatively respond to user requests?
  • Concurrency – How many users can a Connect deployment handle concurrently?

Information around cluster expansion is here: Adobe® Connect™ server pools/clusters and hardware-based load-balancing devices with SSL acceleration

If you were to geographically distribute an active Connect cluster by placing Adobe Connect servers into two separate data centers, that would also require additional licensing. Connect servers in a cluster cannot have more than 2-3ms of latency between and among Connect servers.  Generally you would not geographically distribute Adobe Connect servers into different data centers, however, there is a chapter in the aforementioned clustering article on the topic. With that said, the architecture depicted below, is an example of a distributed active Adobe Connect cluster that is is spread between two local data-centers with nominal latency between those data-centers (less than 3ms of latency). All four servers are in production and all are actively hosting meetings and serving on-demand content.  This Connect architecture example depicted in the diagram below requires a four-server Connect cluster license:

 

Cross-DC-CLUSTER

 

Vantage Point is not just about Bandwidth

Vantage Point from Refined Data works with Connect and provides remote control of Cameras, Microphones, Telephones, Volumes, Tech Support, Motion Detection, Mouse Detection, Continuous attendance tracking and reporting and much more so it’s not just about bandwidth reduction.

On the bandwidth front, Vantage Point publishes streams to Refined Data servers at 100Kbps for each participant in the room; this is less than Connect in most cases, but the Host only consumes as many streams as they can view at one time. The host can see as few as 5 or 6 students at one time or as many as 50 or more depending on their screen resolution, window size, Vantage Point settings, connectivity and bandwidth availability.

This means that even with 100 Participants in a Connect room and one Host the bandwidth consumption looks like this:

  • Participant Load: 100kbps Up (to publish their own camera to Vantage Point), 100kbps down (to view the Host in Connect). This is a small signature on the network.
  • Host Load: 100kbps Up (to publish their own camera in Connect), 2.5mbps down (assuming they view 25 participant cameras in Vantage Point at one time). The Host is the only one who needs a really good connection.
  • Total Load on Adobe Connect: 1 publish stream + 99 subscribers

The host can always reduce their own load simply by viewing fewer simultaneous Video pods in Vantage Point. The bandwidth load for students or participants is not affected at all by class size. Bandwidth load for Hosts rises linearly with class size but can be limited by the host at any time based on the maximum number of cameras they view at one time.

In Connect, the bandwidth load rises with the number of cameras being shared:

  • 4 Cameras: 16 Connections on the Server, each user publishes 100kbps and consumes 300kbps
  • 10 Cameras: 100 Connections on the Server, each user publishes 100kbps and consumes 900kbps
  • 20 Cameras: 400 Connections on the Server, each user publishes 100kbps and consumes 1.9mbps

Even if Connect could technically support 50 or 100 simultaneous web cams in a single meeting (2,500 streams risks significant latency), consider the requirement that participants would need 5-10mbps of bandwidth to support the load, before accounting for VoIP, screen-sharing and basic overhead. Anything above 10 simultaneous web cameras may be difficult for a host to manage and apart from any other considerations, there may not be enough real-estate for content if you are showing 10 or more web cams in the meeting room.

Vantage Point only publishes at 100kbps, most of the time; DSL and Standard quality is already more than twice this load in Connect and can easily rise higher if the room is set to use the Highest video quality at 16:9. With Vantage Point, Adobe Connect saves the server load, participants are not affected by class size, Hosts can see all of their students, all of the time and enjoy unparalleled control of the classroom environment.

Check it out at Refined Data: http://vantagepoint.refineddata.com/

Connect on VMWare – some deployment tips

Issue: VMWare is ubiquitous in the enterprise and while it opens up huge potential for management of the Connect infrastructure, it must be planned and executed with an eye toward robustness.

This advice is gleaned from conversations with senior persons on our operations team as well as from support cases generated by various customers with on-premise VMWare deployments of Connect.

One of the most important and often overlooked variables about virtualization is to make certain that  VMware is compatible with all the underlying components of the server and network architecture. The infrastructure supporting VMWare must be verified by VMware under their Hardware Certification Program or Partner Verified and Supported Products (PSVP) program; be sure to use certified hardware.

Here is the link to the compatibility reference:  http://www.vmware.com/resources/compatibility

With Connect you must consider both Tomcat and  FMS; the former can run on most anything, while the latter is a bit more demanding; RTMP can be acutely;y affected by latency and packet transmissions. If you notice unpredicted latency or a surprise crash of FMS with Connect 9.1, a good test would be to check the network components; sniff for packet transmission issues – have the vNIC of the guest VMs configured to use VMXNET3; this is a good place to start.

With reference to recommendations and best practices, it really depends on the VMware infrastructure adopted. The following references serve as a guide for an enhanced environment:

Enterprise Java Applications on VMware – Best Practices Guide: http://www.vmware.com/resources/techresources/1087

Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs: https://www.vmware.com/resources/techresources/10220

Performance Best Practices for VMware vSphere 5.1: https://www.vmware.com/resources/techresources/10329

The key with Network Storage is speed. If you lose connectivity to the shared storage then only what is cached on the origins will be available.

Shared storage requirements

  • Disk specs: 10,000–15,000 RPM — Fibre Channel preferred
  • Network link: TCP/IP — 1GB I/O throughput or better
  • Controller: Dual controllers with Active/Active multipatch capability
  • Protocol: CIFS or equivalent

Avoid, virtualizing the Connect database if possible.

I have seen that in some customer-based VMWare environments that are overtaxed, that latency among the servers on 8507 (and 8506), can cause problems. Intra-cluster latency (server to server communication) should never exceed 2-3ms. When it does we see intermittent crashes. I had one customer who had a particularly weak infrastructure and for whom I could predict his crashes; he was doing back-ups and running other tasks at a certain time weekly that would tax and hamper network connectivity for about an hour; these tasks were so all-consuming on the network, they turned every cluster resource into an individual asset on its own island. The log traces bore this out and we knew with precision what was going on. He knew he needed to upgrade his infrastructure and in the meantime we worked out a reaction plan to deal with the issue; it included:

  1. Place a higher than normal percentage of cache on each server to limit invoking shared storage
  2. Set the JDBC driver reconnection string for Database connectivity
  3. Plan Connect usage around these maintenance activities and when possible, do Connect maintenance activities at the same time as well – not very difficult as these were after hours, but being a  global operation, still not a given.

Tunneling with RTMP encapsulated in HTTP (RTMPT) should be avoided as it causes latency

Tunneling with RTMP encapsulated in HTTP or RTMPT should be avoided as it causes latency that can have a negative impact on user experience in a Connect meeting. In rare circumstances,the latency commensurate with tunneling RTMP encapsulated in HTTP, can become so acute that it renders Connect unusable for affected clients. The performance hit commensurate with tunneling is one of the primary reasons we continue to deploy Connect Edge servers as they often can replace third-party proxy servers that are often the cause of tunneling latency..

While the amount of acceptable latency depends on what one is doing in the room; RTMPT tunneling affects most activities. Some activities, such as screen-sharing are more bandwidth intensive than other activities such as presenting an uploaded PowerPoint from within a meeting room; The high latency commensurate with RTMPT tunneling would affect the former more than the latter. VoIP is often the first thing to make the effects of high latency felt. Here is some feedback from a test I did while on site with a client dealing with tunneling because of their refusal to pipe RTMP around a third-party proxy:.

External user tunneling during test:
Spike at 3.10/3.02 sec
Latency 403/405 ms up to 3.53/3.52 sec up .064 down 118
When latency peaks to 2.6/2.4 sec I get a mild interruption to the audio V
Video pauses momentarily when the latency spikes

Internal user with direct connection:
2 msec / 1 msec Up 0.08 kbits down 127 kbits
No pauses, delays or spikes

Tunneling should only be considered as a fallback mechanism or safety net to allow connections when RTMP is blocked due to something unplanned or for a few remote clients who must negotiate specific network obstacles. When RTMPT is the default by network design, you will need to limit your activities within Connect to those feature that use the least bandwidth.

The picture below shows a direct connection over RTMPS on 443 is being blocked somewhere on the client’s network and the fallback mechanism built into Connect of tunneling RTMP encapsulated in HTTPS is the fallback path. This is usually caused by either proxy servers or firewalls or both – any application-aware appliance on one’s network that sees the RTMPS traffic on 443 and recognizes that it is not HTTPS is a potential obstacle; RTMPS traffic is on port 443 and while it is disguised as HTTPS, it still may be blocked. The result is tunneling, indicated by the “T” in the output:

.tunneljpgOctagon

Compare with a connection without tunneling:

no-tunnel

The recommended steps for anyone experiencing persistent tunneling, is for their network engineers to trust the source IP addresses of the Adobe hosted, ACMS, managed ISP or on-premise Connect/FMS server VIPs in order to prevent the blocking of RTMPS traffic. RTMPS is not supported by any third-party proxy server. Static bypass works well to solve this issue. The problem stems from network policies that require all traffic to go through a proxy. The result is tunneling with commensurate high latency and drops. RTMPS must be allowed to stream around a proxy to avoid the overhead and latency of tunneling encapsulated within HTTPS. Attempts to cache the stream add no value.

Other options will depend on the capabilities of the third-party proxy servers in the affected client infrastructure. Blue Coat ProxySG is one of the popular proxy server options in our niche. In cases of latency invoked by tunneling RTMP encapsulated in HTTP on a network that employs Blue Coat ProxySG servers, sniff tests done by support representatives have indicated that when an affected client attempts to connect to an Adobe Connect meeting, those clients would establish both explicit HTTP connections based on PAC file settings in the system registry to the Blue Coat ProxySG pool through a hardware-based load balancing device (HLD) and transparent HTTP and SSL connections through Blue Coat ProxySG via WCCP GRE redirect to several Adobe Connect servers. The problem manifests with RTMPS when the clients attempt to establish an SSL connection directly to the destination host without going through PAC file proxy settings. Since a Blue Coat ProxySG is commonly configured to perform an SSL intercept on both explicit and transparent HTTPS traffic, upon examining the content after decrypting the SSL payload from the clients, the Blue Coat ProxySG will return an exception and close the connection because the request doesn’t contain an HTTP component and cannot be parsed for policy evaluation. As a workaround, other than using static bypass, it is possible to create a proxy service with the destination set to the Adobe Connect server IP range on port 443 and to set the proxy setting to TCP-Tunnel with Early Intercept enabled. This will allows Blue Coat ProxySG to intercept and tunnel the traffic without considering whether it is RTMPS or HTTPS.

Watch for a more comprehensive article on this topic forthcoming.

Stunnel does not Startup with Connect

Problem: stunnel does not start up with Connect

Although stunnel can be installed as a service, it doesn’t load the stunnel.conf file(!) one workaround is to not setup the services to run automatically but to auto-run these batch files at startup:

Note: This tech-note assumes stunnel is installed in c:\Connect\9.0.0.1\; be sure to adapt the scripts accordingly.

Origin server startup.bat:

@ECHO ON
net start FMS
net start FMSAdmin
net start ConnectPro
net start CPTelephonyService
c:\Connect\9.0.0.1\stunnel\stunnel.exe stunnel.conf
@ECHO OFF Origins stop.bat:

@ECHO ON
net stop ConnectPro
net stop CPTelephonyService
net stop FMSAdmin
net stop FMS /y
@ECHO OFF

If you have remote Edge servers, use these; they includes cache clearing maintenance.

Edges start.bat:

@ECHO ON
net start fms
ping 1.1.1.1 -n 1 -w 10000>nul
net start fmsadmin
c:\breeze\edgeserver\stunnel\stunnel.exe stunnel.conf
@ECHO OFF

Edges stop.bat:

@ECHO ON
net stop fmsadmin
ping 1.1.1.1 -n 1 -w 10000>nul
net stop fms
ping 1.1.1.1 -n 1 -w 20000>nul
del /Q /S c:\breeze\edgeserver\win32\cache\http\*.*
ping 1.1.1.1 -n 1 -w 10000>nul
@ECHO OFF

Run > gpedit.msc
Local Computer Policy > Computer Configuration > Windows Settings > Scripts (Startup/Shutdown)
Batch files are assigned as startup & shutdown scripts. This is in addition to being available to be run manually.

Providing Diagnostic Data to Expedite Solutions for Connect Meeting Issues

Issue: Anything that may happen during a meeting which has a pejorative effect on end-user experience.

Solution: In Connect 9.1 we have a great diagnostic option in the meeting room. You can immediately pull logs from any meeting to diagnose:

If you click Help>About Adobe Connect, while holding down the Ctrl key, the debug logs will appear int he meeting room and you will have the option to copy them to your clipboard.

log-mtg.fw

 

log-mtg1.fw

Sending me these, along with the RTMP string  Help> About Adobe Connect, while holding down the Shift key – this will be most helpful from the client experiencing the extreme latency.

rtmp-mtg.fw

Now if you want to take it even one step further and provide a client-side view of the meeting:

The instructions for enabling client-side logging are here: http://helpx.adobe.com/adobe-connect/kb/enable-logging-acrobat-connect-professional.html

Providing all this data along with the date and time (including timezone) and Meeting URL of any issue, will greatly expedite analysis and solution.

Resource Constraints cause Connection Read Error in Logs on Clustered Connect Servers

Issue: FCSj_IO:4 (x) – Connection read error: -1 LP: 5345 RP: 8506 URI: rtmp://localhost:8506/meetingapp/7/12345678

I have seen that in some VMWare environments that are very overtaxed for resources, latency between/among the clustered Connect servers on ports 8507 (and also 8506 though 8506 does not cause this error), can cause problems. Intra-cluster latency should never exceed 2-3ms. When it does we see intermittent errors and can also see crashes.

I had one unnamed customer who had a particularly weak infrastructure and  I could predict his crashes; he was doing back-ups and running other tasks at a certain time weekly that would severely hamper network connectivity for about an hour; these tasks were so all-consuming on the network, they turned every Connect cluster resource into an individual asset on its own island. The Connect logs bore this out and we knew with precision what was going on and could predict his call or email based on his maintenance schedule. He knew he needed to upgrade his infrastructure and in the meantime we worked out a reaction plan to deal with the issue; it included:

  • Place a higher than normal percentage of cache on each server to limit invoking shared storage during maintenance (see page 57)
  • Set the JDBC driver reconnection string for Database connectivity robustness
  • Plan heavy Connect usage around network and server maintenance activities and when possible, do your Connect server maintenance activities at the same time as well.

Connect & Unified Voice (UV) Traffic Flow Diagram

Issue: Plan for the flow of traffic to enable UV among the various components in any Connect deployment: Connect, Flash Media Gateway (FMG), Session Initiation Protocol (SIP)

There are numerous documents on the topic of Unified Voice (UV) with Connect:

This diagram shows the flow of traffic and the protocols used for UV with Connect and is offered as a planning and a troubleshooting tool; click on the diagram to expand it for viewing:

 

Connect_FMG_Flow