Assumption: The questions below assume that you are using CRX2.2 with Hotfix 126.96.36.199 or greater.
Question: Where does the crx.xxx file get created?
Answer: On a slave node when it is first joined to the cluster. And this is current repository. Note that existence of this directory does not mean that this is slave node. Please see below on how to decide which one is master.
Question: How can I decide which one is the master ?
Answer: Go to http://host:port/crx/config/cluster.jsp on any node.
Question: All of my instances are down, How can I decide which one was the last current master ?
Answer: Note that clustered.txt file is only present in slave node (If everything is fine). The master node instance will not have the clustered.txt file.
Question: How can I decide which is my current directory ?
Answer: You can check bootstrap.properties on the node and check for repository.home property. If there is no crx.xxx then crx-quickstart is the current directory.
Question: Are writes always performed through the master ?
Question: What if the cluster master instance goes down ?
Answer: One of the slaves will become the master. If you have multiple nodes in a cluster, one of the slaves will become Master based on election. You can remove the clustered.txt file from /crx-quickstart/crx.XXX to switch a slave to master.
Question: What if the old master then comes back online ?
Answer: Current master will continue to be master in cluster. The old master will be a slave and you can verify this by the existence of clustered.txt under the /repository folder.
Question: What if the current master (old slave) is down again ?
Answer: Current slave will become the master (If multiple nodes exist, one of the slaves will become current master by election).
Question: What is best way to install Hot Fix on a cluster node ?
Answer: Install in Master (Use above method to determine which is master) -> let it sync to Slave -> Check slave package manager to make sure it is installed -> Stop Slave -> Make sure it is down -> Stop Master -> make sure it is down -> for a instance where you have crx.XXX folder, check current repository from bootstrap.properties file and then copy crx-quickstart/crx.XXX/patches to crx-quickstart/repository (Or use manual install of jar file on slave instance) -> start Master -> make sure it is up -> Start Slave -> Check repo version by going to repository configuration and searching for jcr.repository.version
Question: At some point I want to run as stand alone system and make crx-quickstart as my current directory what should I do ?
Answer: If you want to do it in Master instance where there is no crx.xxx folder. You probably don’t have to do any thing. If you want to do it on slave instance where crx.xxx folder first thing you have to make sure that which is current repository (You can do that by doing to bootstrap.properties file). Make sure that your system is stopped -> rename repository folder under crx-quickstart folder -> rename crx.xxx to repository -> move it to crx-quickstart folder -> delete bootstrap.properties file -> delete cluster* under crx-quickstart/repository-> delete revision.log -> delete tarJournal -> restart the system. Note that ideally if you want to keep crx.xxx as current directory then you don’t have to do any thing.
Question: What about tar optimization in a cluster?
Answer: TarOptimization always runs on the master node in a clustered environment. If you are optimizing tar files in a cluster, you need to ensure that the Tar optimization times are set to the same value on all cluster nodes.
Question: How about Datastore garbage collection ?
Answer: Check this link for that.
Question: What if the master is stopped in the middle of a synch process ?
Answer: If this is a graceful stop, the master will give 60000 ms for the slave to synch up. If the slave synchs up before that, the master is stopped after the synch is complete. Check cluster system properties to see how to configure this.
Question: How can I make sure that one of the nodes is always master?
Answer: You need to set “preferredMaster” to “true” for that node. For more Information please check this link.
Question: How does replication work in a clustered environment ?
Answer: Similar to write operations, replication is delegated to the master, if done from a slave instance.
Question: What happens to the cluster when there is a network issue ?
Answer: Ideally if you are not sure about network connections or there is network problem often between cluster nodes shared nothing clustering is not recommended. But If chose to select Shared nothing clustering, Slave will try to read from master and after some time when it is unable to do so you will get “Read from master timed out.” error and slave will be disconnected.
Question: What should I do when there is a network issue ?
Answer: You should stop the slave, and restart it when the network is back to normal. The other option is to set “becomeMasterOnTimeout” parameter on the slave (in repository.xml), This will make the slave the master when timeouts happen (The problem here would be that you will have two masters at one time, so this is not highly recommended).