Resource Constraints cause Connection Read Error in Logs on Clustered Connect Servers
Issue: FCSj_IO:4 (x) – Connection read error: -1 LP: 5345 RP: 8506 URI: rtmp://localhost:8506/meetingapp/7/12345678
I have seen that in some VMWare environments that are very overtaxed for resources, latency between/among the clustered Connect servers on ports 8507 (and also 8506 though 8506 does not cause this error), can cause problems. Intra-cluster latency should never exceed 2-3ms. When it does we see intermittent errors and can also see crashes.
I had one unnamed customer who had a particularly weak infrastructure and I could predict his crashes; he was doing back-ups and running other tasks at a certain time weekly that would severely hamper network connectivity for about an hour; these tasks were so all-consuming on the network, they turned every Connect cluster resource into an individual asset on its own island. The Connect logs bore this out and we knew with precision what was going on and could predict his call or email based on his maintenance schedule. He knew he needed to upgrade his infrastructure and in the meantime we worked out a reaction plan to deal with the issue; it included:
- Place a higher than normal percentage of cache on each server to limit invoking shared storage during maintenance (see page 57)
- Set the JDBC driver reconnection string for Database connectivity robustness
- Plan heavy Connect usage around network and server maintenance activities and when possible, do your Connect server maintenance activities at the same time as well.