Archive for May, 2017

May 22, 2017

AEM – Circuit Breaker Innovation via a Hystrix Integration

From time to time the Adobe Partner Experience (APx) team has the privilege to check out some truly innovative stuff. Yogesh is a great guy and offered to show us a cool integration pattern that he has been working on. We liked it so much that we decided to let him share it with the world via the Content Management blog. – Darin Kuntze @DarinKuntze

Yogesh Kulkarni is an experienced Adobe AEM/CQ developer/architect specializing in best practices for designing connected digital experience using AEM technology stack.

He is currently working for AKQA (a digital agency) as a Senior Software Engineer (Adobe AEM/CQ).

AEM 6.1 SP1 – Hystrix Integration

We had a requirement where the client wanted to add the Circuit Breaker pattern to an AEM component which calls RESTful endpoints, to support the following use cases:

  • If more than 10% of the calls fail within a minute then the circuit should trip.
  • After the circuit is tripped, the system should periodically check if the external service API is back up and working using a background thread.
  • The component prevents users from experiencing a sub-optimal response time.
  • The component should present a user-friendly message in case of any service failure.

Circuit Breaker Pattern

In our case, the component is responsible for making a call to RESTful endpoints to register/log in the user and provide an option to update related data/content after login.

The Circuit Breaker pattern can handle remote resources and service call failures more gracefully. It can prevent an application from repeatedly trying to execute an operation that’s likely to fail, allowing it to continue without waiting for the fault to be fixed or wasting CPU cycles while it determines that the fault is long lasting. The Circuit Breaker pattern also enables an application to detect whether the fault has been resolved. If the problem appears to have been fixed, the application can try to invoke the operation once again.

Hystrix  (a Netflix library) has a built-in ready-to-use Circuit Breaker. When we apply a Circuit Breaker to a method, Hystrix watches for failing calls to that method, and if failures build up to a pre-defined threshold, Hystrix opens the circuit so that subsequent calls automatically fail. While the circuit is open, Hystrix redirects calls to the method, and they’re passed on to a specified fallback method.

Reference: https://docs.pivotal.io/spring-cloud-services/1-3/common/circuit-breaker/

OSGi Dependencies

In order to get Hystrix running in AEM, you need to install the following dependency bundles in Felix.

Artifact ID(s) Version
1 org.apache.servicemix.bundles.hystrix 1.5.9_1
2 org.apache.servicemix.bundles.hystrix-event-stream 1.5.9_1
3 rxjava 1.2.9
4 org.apache.servicemix.bundles.commons-configuration 1.9_2
5 com.diffplug.osgi.extension.sun.misc 0.0.0
6 HdrHistogram 2.1.9

 

Apply the Circuit Breaker Pattern

Netflix Hystrix looks for any method annotated with the @HystrixCommand annotation and wraps that method in a proxy connected to a Circuit Breaker so that Hystrix can monitor it.

The code to be isolated is wrapped inside the run() method of a HystrixCommand similar to the following:

import com.netflix.hystrix.HystrixCommand;

 

public class HelloServiceGetCommand extends HystrixCommand<HelloResult> {
 


private final HttpGet httpGet;


    public HelloServiceGetCommand(final HttpGet httpGet) {


     super(HystrixCommandGroupKey.Factory.asKey("HelloGroup"));
 
     LOG.debug("Is CC breaker open " + isCircuitBreakerOpen());
      this.httpGet = httpGet;
    }
 
 
    @Override
    protected HelloResult run() throws IOException {

       //your logic goes here
        CloseableHttpClient httpClient = HttpClientBuilder.create().build();


        LOG.debug("Health count : TotalRequests " + metrics.getHealthCounts().getTotalRequests());        

               //call httpClient.execute

 

               //catch any error and populate HelloResult object

               return helloResult;

 

}

Fallback

To handle the failure of external services, Hystrix has built in the following defaults:

  1. Timeout for every request to an external system (default: 1000 ms)
  2. Limit of concurrent requests for external system (default: 10)
  3. The Circuit Breaker to avoid further requests (default: when more than 50% of all requests fail)
  4. Retry of a single request after the Circuit Breaker has triggered (default: every 5 seconds)
  5. Interfaces to retrieve runtime information at the request and aggregate level (there’s even a ready-to-use realtime dashboard for it) * Yet to be defined in OSGi.
    How-To-Use#Fallback

Simple Fallback method using Fallback: Stubbed pattern:

@Override
protected HelloResult getFallback() {
 


         LOG.debug("FALLBACK : is CC breaker open {} isResponseTimedOut() {}             isResponseTimedOut() {}",  isCircuitBreakerOpen(), isResponseTimedOut(),isResponseThreadPoolRejected());

LOG.debug("Health count : TotalRequests {} ErrorPercentage {} ErrorCount {}", metrics.getHealthCounts().getTotalRequests()
            , metrics.getHealthCounts().getErrorPercentage()
,metrics.getHealthCounts().getErrorCount());

 

// returns error object to service to send it to FE


return getHelloResultError();
}

The fallback method returns the error code which is then consumed by a UI component.

How to Run Hystrix Command

There are many ways to run the command. Following simple call is triggered from HelloServiceImpl to invoke Hystrix command.

public class HelloServiceImpl implements HelloService {

 

  private callCommand(){

    new HelloServiceGetCommand(getRequest).execute();

   //other service logic goes here

 }

}

Hystrix Runtime Configuration

Configuring a Hystrix command details can be found here: Hystrix Configuration. It is simple to update the configuration.

For example, the default value for circuitBreaker.requestVolumeThreshold is set to 20. We override the property using HystrixCommandProperties.Setter, as shown below.

public HelloServiceGetCommand(final HttpGet httpGet) {
    super(Setter
            .withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloGroup "))
            .andCommandPropertiesDefaults(
                    HystrixCommandProperties.Setter()
                       .withCircuitBreakerRequestVolumeThreshold(MyAudiConstants.CB_REQUEST_VOLUME_THRESHOLD)
                            .withCircuitBreakerErrorThresholdPercentage(MyAudiConstants.
                                    CB_REQUEST_ERROR_THRESHOLD)).
                    andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter().
                            withCoreSize(MyAudiConstants.CB_REQUEST_THREAD_POOL_SIZE)).
                    andCommandKey(HystrixCommandKey.Factory.asKey("HelloGroup ")));

 

…

}

Monitoring

A dashboard for monitoring applications using Hystrix is available in the hystrix-dashboard module. However, hystrix-dashboard has not been deployed to our AEM instance at this time.

1) Circuit Breaker is close at the start

DEBUG [hystrix-HelloGroup2] com.akqa…services.commands.HelloServiceGetCommand CC breaker open false

All HelloCommand requests are going through.

2) Now FAILURE occurs

DEBUG [hystrix-HelloGroup-2] com.akqa…services.commands.HelloServiceGetCommand CC breaker open True Events[SHORT_CIRCUITED]

3) Lastly, CB is closed once host is back online.

DEBUG [hystrix-HelloGroup-2] com….services.commands.HelloServiceGetCommand CC breaker open false

The example above just scratches the surface of how to improve the Service Resilience in a Felix container using Hystrix. The following resources can provide more advanced tricks to help make your application more fault tolerant.

https://github.com/Netflix/Hystrix/wiki/How-To-Use#Fallback

https://github.com/Netflix/Hystrix/wiki/How-To-Use#Common-Patterns

https://github.com/Netflix/Hystrix/wiki/Configuration

https://github.com/Netflix/Hystrix/wiki/Metrics-and-Monitoring

Summary

As demonstrated, it is possible to use the state-of–the-art, industry standard fault-tolerance library Hystrix in AEM to protect your service against cascading failures and to provide fallback behavior for potentially failing calls.

 

All opinions expressed by Yogesh Kulkarni and are his own and not Adobe’s.

 

11:14 AM Permalink
May 5, 2017

Generate Rockstar AEM Logs Metrics with R Programming Utility

Today’s Tips & Tricks guest post comes from Atish Jain, who is a Senior AEM Developer at SapientRazorfish (Publicis.Sapient) with over seven years of experience working with CMS based applications. Atish was a semi-finalist in this year’s AEM Rockstar competition. 

The tool I’m sharing is an R programming based utility to find gaps in renditions versus assets uploaded. This can be helpful in asserting the Bulk Migration, Longevity Tests success, Upload Activities, and Comparative Analysis in your AEM instance.

For those unfamiliar with R programming, it is a free open-source language and environment used for data manipulation, calculations, statistical computing, and graphical techniques useful to statisticians, analysts, data miners, and other researchers.  To learn more about R, visit r-project.org.

Trend analysis for upload vs. workflow completion and systems experience an increase in slowness with time. The stats can be analyzed to find missing assets reports and degradation in AEM server performance under continuous load. It works on the logs that are produced under crx-quickstart folder of AEM. Hence, there is no direct performance impact on the AEM instance. Also, reports can be generated over historical log files to produce and find comparative results, and do an analysis.

The utility helps you:

  • Analyze the exact count of missing Assets Renditions with the upload path that has been missed.
  • Conduct trend analysis for uploaded assets versus renditions generation. The pace of renditions generation can be calibrated for better insights for estimating activity timings and degradation factors.

The AEM logs are powerful and transformable to produce vital statistics. This utility, based on R programming language utilizes this power and generates metrics.

Here is how the utility works:

Step1: parses error.log(s) to subset log lines with date time – A

Step2: parses A to find log lines for upload trigger –B

Step3: parses A to find log lines for last rendition – C

Step4: merges B & C to create reports.

The concept detailed above can be enhanced into a more exhaustive application that can create extensive reports from AEM logs.

For example, the utility can be extended to generate more detailed graphical reports via the graphical plugin API available for R.

If you have any questions, you can contact Atish at ajain216@sapient.com. All opinions expressed by Atish Jain and are his own and not Adobe’s.


 

# R SCRIPT TO FIND ASSETS UPLOADED AND ANALYSE CORRESPONDING THE RENDITIONS GENERATION COUNT
#USER INPUTS
QUICKSTART_LOGS_DIR <- "D:/Atish/aem-rock/output/logs"
OUTPUT_DIR <- "D:/Atish/aem-rock/results/day1"
print_renditions_gap_flag <- TRUE
upload_report_print_flag <- TRUE
UPLOAD_TRIGGER_TXT_PATTERN <- "*EXECUTE_START*"
RENDITION_LOG_TXT_PATTERN<- "jcr:content/renditions/cq5dam.web.1280.1280.jpeg"
 
# DO NOT CHANGE THIS LINE
ERROR_LOGS_FILE_PATTERN <- "error\\.log\\.\\d\\d\\d\\d*"
 
renditions_gap <- 0
 
#LIST ALL ERROR LOG FILES UNDER crx-quickstart
error_log_files_list <- function(QUICKSTART_LOGS_DIR) {
setwd(QUICKSTART_LOGS_DIR)
error_file_list <- list.files(pattern = ERROR_LOGS_FILE_PATTERN)
error_file_list <- unlist(list(error_file_list,list.files(pattern = "error.log$")))
error_file_list
}


upload_report_calculation <- function(logs_dir){
setwd(logs_dir)
dataset_upload <-NULL
dataset_workflowstart <-NULL
error_log_combined <- NULL
 
for (file in error_file_list){
print(paste("Analysing log file : ", file, sep=""))
error_log_full_dataset <- NULL
error_log_subsetdate_dataset <- NULL
dataset_x <- NULL
error_log_full_dataset <- read.table(file, header=FALSE, quote="", fill=TRUE)
colnames(error_log_full_dataset) <- c("date", "time", "level", "type", "class" , "logtext1", "logtext2", "logtext3", "assetPath")
 
#Filter rows which contains date only and assign it to error_log_subsetdate_dataset
error_log_subsetdate_dataset <- subset(error_log_full_dataset, grepl("\\d\\d.\\d\\d.\\d\\d\\d\\d", date))
write.csv(file="dataset.csv", x=error_log_subsetdate_dataset)
 
#Filter rows which contains *EXECUTE_START* and */content/dam/*
#Refine the above dataframe to contain only asset upload trigger log.
upload_trigger <- subset(error_log_subsetdate_dataset, grepl(UPLOAD_TRIGGER_TXT_PATTERN, logtext2))
upload_trigger <- subset(upload_trigger, grepl("*/content/dam/*", class))
 
#Filter rows which contains string:jcr:content/renditions/cq5dam.web.1280.1280.jpeg.
rendition_generation <- subset(error_log_subsetdate_dataset, grepl(RENDITION_LOG_TXT_PATTERN, assetPath))
 
#concatenate the data and time columns of subset data frames
upload_trigger$datetime <- paste(as.Date(upload_trigger$date,format='%d.%m.%Y'), upload_trigger$time, sep=" ")
rendition_generation$datetime <- paste(as.Date(rendition_generation$date,format='%d.%m.%Y'), rendition_generation$time, sep=" ")
 
renditions_gap <- renditions_gap + (nrow(upload_trigger) - nrow(rendition_generation))
upload_trigger_df <- data.frame(sub('.*:','',sub('/jcr.*', '', upload_trigger$class)), upload_trigger$datetime)
colnames(upload_trigger_df) <- c("assetPath","upload_trigger.datetime")
write.csv(file="upload_trigger_df.csv", x=upload_trigger_df)
 
#Prepare renditions generation dataframe
rendition_gen_df <- data.frame(gsub('.{49}$', '', rendition_generation$assetPath), rendition_generation$datetime)
colnames(rendition_gen_df) <- c("assetPath","rendition_generation.datetime")
write.csv(file="rendition_gen_df.csv", x=rendition_gen_df)
dataset_x <- merge(upload_trigger_df,rendition_gen_df,'assetPath',all.x=TRUE)
 
#Create a new data frame with assetPath, upload, rendition generation timings
dataset_x$timeDiff <- as.POSIXlt(dataset_x$rendition_generation.datetime, "%d-%m-%Y %H:%M:%S") - as.POSIXlt(dataset_x$upload_trigger.datetime, "%d-%m-%Y %H:%M:%S")
 
filename <- paste(file, ".csv", sep="")
dataset_upload <- rbind(dataset_upload,dataset_x) 
}
 
return(dataset_upload)
}
 
print_rendtions_gap_report <- function(renditions_gap, print_renditions_gap_flag) {
if(print_renditions_gap_flag){
temp_var <- paste("Renditions gap vs uploaded assets: ", renditions_gap, sep="")
 
print(temp_var)
setwd(OUTPUT_DIR)
write(temp_var,file="Rsummary.txt",append=FALSE)
}
}

upload_report_print <- function(dataset_upload,upload_report_print_flag){
if(upload_report_print_flag){
setwd(OUTPUT_DIR)
row.has.na <- apply(dataset_upload, 1, function(x){any(is.na(x))})
uploadAsset <- dataset_upload[!row.has.na,]
write.csv(file="uploadAsset.csv", x=uploadAsset)
 
missingRenditions <- dataset_upload[row.has.na,]
write.csv(file="missingRenditions.csv", x=missingRenditions)
x11()
barplot(as.matrix(uploadAsset$timeDiff), main="Time-Diff Report", xlab="AssetsUploaded", ylab= "timeLag(sec)", beside=TRUE, col=rainbow(1))
dev.copy2pdf(file = "TimeDiffReport.pdf")
}
}
 
# Functions Execution
setwd(QUICKSTART_LOGS_DIR)
error_file_list <- error_log_files_list(QUICKSTART_LOGS_DIR)
dataset_upload <- upload_report_calculation(QUICKSTART_LOGS_DIR)
print_rendtions_gap_report(renditions_gap, print_renditions_gap_flag)
upload_report_print(dataset_upload,upload_report_print_flag)


 

4:52 PM Permalink
May 1, 2017

Sling Pipes – A Rockstar Way to Deal with JCR

Today’s post is by guest writer Rima Mittal, who was invited to compete for the title of AEM Rockstar at the 2017 Adobe Summit. Along with the other finalists, we invited Rima to contribute a blog and video to our series, Rockstar Tips & Tricks. At the AEM Rockstar Session, Rima spoke on Sling Pipes – A Rockstar Way to deal with JCR. 

Rima Mittal is an Adobe Certified Lead AEM Developer and Consultant. She has extensive experience working on Java and AEM and has done multiple POCs on integrating AEM with external third-party systems. A strong believer in the importance of communities and knowledge sharing in the world of software development, she has been a speaker at various developer conferences like AEMHub 2015 and adaptTo() 2016. 

Ever encountered a situation where code changes were introduced after the client started authoring and some pages had to be re-authored? Ever spent time writing code just to modify a few hundred pages that were already authored, or with removing a component from hundreds of authored pages? Have you struggled to modify content already in the repository? Need a script to change existing production content? Sling Pipes to the rescue.

Sling Pipes

Sling Pipes is a tool for doing extract – transform – load operations through a resource tree configuration. This tiny toolset provides the ability to do such transformations with proven and reusable blocks, called pipes, streaming resources from one to the other.

A pipe is a JCR node with:

  • sling:resourceType property – Must be a pipe type registered by the plumber
  • name property – Used in bindings as an id
  • path property – Defines pipe’s input
  • expr property – Expression through which the pipe will execute
  • additionalBinding node – Node you can add to set “global” bindings (property=value) in pipe execution
  • additionalScripts – Multivalue property to declare scripts that can be reused in expressions
  • conf child node – Contains addition configuration of the pipe

Registered Pipes

 

 

Sling Pipes Demo

Here is a demo video with more on how to use and execute sling pipes in AEM.

 

More details can be found in the official documentation at https://sling.apache.org/documentation/bundles/sling-pipes.html

For any questions or comments, I can be reached on Twitter at @rimamittal or on LinkedIn at https://www.linkedin.com/in/rimamittal/

12:11 PM Permalink