Reverse Replication woes – solved

- Sensei Martin

Hot off the press/keyboard (i.e. not fully tested). With the help of an Adobe support engineer in Basel and an on-site Adobe consultant we discovered what the root cause of the reverse replication problem was.

Namely, that when a user voted in a poll, the new vote AND ALL previous votes were being reverse replicated. This caused a MASSIVE workload on the Author because each node in the /var/replication/outbox did not contain 1 corresponding vote; it actually contained ALL of the votes including the new additional one. This explains why the Author would take 20 minutes to process just 10 nodes in the outbox.

...

-----------

Read the complete post at this URL.

Reverse Replication woes

- Sensei Martin

So, in my previous post I said how wonderful FP37434 is (the replication stabilisation FP). Unfortunately, it did not solve our problem and we now have a large volume of content to reverse replicate (~50k nodes in /var/replication/outbox across all our publish servers).

We are currently facing 2 problems. When the RR agent polls, the publish server with FP37434 exhibits a huge native memory leak (approx 8GB of native memory is being claimed) causing a great deal of paging on the system.

When we batch this down to only 10 items in the outbox, we noticed that the author takes 30 minutes to process 10 nodes.

Adding extra logging (com.day.cq.replication.content.durbo) at DEBUG level shows that the Author is doing valid work for 30 minutes processing just 10 nodes from the outbox.

...

---------

Read the complete post at this URL.