Using the Data Store Garbage Collection in CRX and CQ

Andrew Khoury sent this information to me, basing the instructions on his solution to a customer’s problem.

I posted a solution to the problem in which the journal files for CRX grew and became a problem. The journal files are tar files in which the repository stores its information. However, items greater than 4KB are not stored in the repository tar files. Instead, CRX stores that data in a set of indexed files found within the crx-quickstart/repository/repository/datastore directory.

Like the tar journal files, these files have data appended, not rewritten. So the size of the datastore directory grows with each deployment that contains files larger than 4KB.

Just as there is a garbage collection functionality for repository tar files, there is a way to do garbage collection on the datastore. Running garbage collection removes unused items from the datastore directory and decreases the amount of its data.

Experience Services and CQ are both built using CRX as their base. So the instructions for CRX applies to the Experience Services and CQ servers, as well.

Running Datastore Garbage Collection

  1. Go to http://[server domain]:[server port]/crx and log in as administrative user
  2. Click Repository Configuration
  3. Click Datastore Garbage Collection
  4. Check all boxes and click Run (if it runs successfully then you are done)

If this process completes successfully, you are done.

If Datastore Garbage Collection fails with an error

Look at the error in the application server log and crx-quickstart/logs/crx/error.log. You may see “File not found: 123,” where 123 corresponds to the id number of a tar file. If so, first try to restore the tar file that corresponds to that id from a backup. For example, the tar file for id 123 would be crx-quickstart/workspaces/crx.default/data_00123.tar. The number in the data tar file name is always padded to five digits so add zeros before the number to make it five digits.

Next steps

If an error occurred, do the following. Do the rest of these steps even if you were not able to restore the missing tar file(s).

  1. Stop the application server or the quickstart server.
  2. If using *nix, log into the server via ssh and cd to the crx-quickstart directory. If using Windows, go to the crx-quickstart drectory.
  3. Make backups of the tar index files. Within the *nix environment use these commands:
    cd repository/version
    mkdir index_tar_backup
    mv index*.tar index_tar_backup/
    chmod 640 data*.tar
    cd ../workspaces/crx.default
    mkdir index_tar_backup
    mv index*.tar index_tar_backup/
    chmod 640 data*.tar
  4. Backup crx-quickstart/workspaces/crx.default/workspace.xml as crx-quickstart/workspaces/crx.default/workspace.xml.backup
  5.  Open crx-quickstart/workspaces/crx.default/workspace.xml in a text editor
  6. Comment out the PersistenceManagerelement:Before:
    <PersistenceManager class=""/>


    <!--PersistenceManager class=""/-->
  7. Add a new persistence manager element with the consistency check parameters set:
    <PersistenceManager class="">
    <param name="consistencyCheck" value="true" />
    <param name="consistencyFix" value="true" />
  8. Restart the application server or the quick start server
  9. Monitor start up by tailing crx-quickstart/logs/crx/error.log and watch fo any errors. In the error.log you will see something very similar to the following when the start up is complete:
     02.03.2012 15:39:52 *INFO * CRXHttpServlet: PackageShareServlet initialized. (, line 52)
     02.03.2012 15:39:52 *INFO * CRXHttpServlet: PackageManagerServlet initialized. (, line 52)
     02.03.2012 15:40:04 *INFO * TarUtils: File system status: created 200 files in 14 ms (14285 ops/sec) (, line 782)
    02.03.2012 15:40:04 *INFO * TarUtils: File system status calculated in 35 ms (, line 795)
  10. Go to http://[server domain]:[server port]/crx and log in as an administrative user
  11. Click Repository Configuration
  12. Click Datastore Garbage Collection
  13. Check all boxes and click Run
  14. Edit the file, crx-quickstart/workspaces/crx.default/workspace.xml, replacing it with the original version that was backed up.

3 Responses to Using the Data Store Garbage Collection in CRX and CQ

  1. Arek says:

    Hello, is there a way to cancel datastore collection while it is running?

    • Deke Smith says:

      Hello Arek,

      There should be. The datastore collection will automatically stop if it runs too long. I do not know the answer to that, though.


  2. Pingback: Importance of AEM Maintenance Procedures for Development and QA Environments | AEM Architect