by Ursula Johnson

Created

May 22, 2017

From time to time the Adobe Partner Experience (APx) team has the privilege to check out some truly innovative stuff. Yogesh is a great guy and offered to show us a cool integration pattern that he has been working on. We liked it so much that we decided to let him share it with the world via the Content Management blog. – Darin Kuntze @DarinKuntze

Yogesh Kulkarni is an experienced Adobe AEM/CQ developer/architect specializing in best practices for designing connected digital experience using AEM technology stack.

He is currently working for AKQA (a digital agency) as a Senior Software Engineer (Adobe AEM/CQ).

AEM 6.1 SP1 – Hystrix Integration

We had a requirement where the client wanted to add the Circuit Breaker pattern to an AEM component which calls RESTful endpoints, to support the following use cases:

  • If more than 10% of the calls fail within a minute then the circuit should trip.
  • After the circuit is tripped, the system should periodically check if the external service API is back up and working using a background thread.
  • The component prevents users from experiencing a sub-optimal response time.
  • The component should present a user-friendly message in case of any service failure.

Circuit Breaker Pattern

In our case, the component is responsible for making a call to RESTful endpoints to register/log in the user and provide an option to update related data/content after login.

The Circuit Breaker pattern can handle remote resources and service call failures more gracefully. It can prevent an application from repeatedly trying to execute an operation that’s likely to fail, allowing it to continue without waiting for the fault to be fixed or wasting CPU cycles while it determines that the fault is long lasting. The Circuit Breaker pattern also enables an application to detect whether the fault has been resolved. If the problem appears to have been fixed, the application can try to invoke the operation once again.

Hystrix  (a Netflix library) has a built-in ready-to-use Circuit Breaker. When we apply a Circuit Breaker to a method, Hystrix watches for failing calls to that method, and if failures build up to a pre-defined threshold, Hystrix opens the circuit so that subsequent calls automatically fail. While the circuit is open, Hystrix redirects calls to the method, and they’re passed on to a specified fallback method.

Reference: https://docs.pivotal.io/spring-cloud-services/1-3/common/circuit-breaker/

OSGi Dependencies

In order to get Hystrix running in AEM, you need to install the following dependency bundles in Felix.

Artifact ID(s) Version
1 org.apache.servicemix.bundles.hystrix 1.5.9_1
2 org.apache.servicemix.bundles.hystrix-event-stream 1.5.9_1
3 rxjava 1.2.9
4 org.apache.servicemix.bundles.commons-configuration 1.9_2
5 com.diffplug.osgi.extension.sun.misc 0.0.0
6 HdrHistogram 2.1.9

 

Apply the Circuit Breaker Pattern

Netflix Hystrix looks for any method annotated with the @HystrixCommand annotation and wraps that method in a proxy connected to a Circuit Breaker so that Hystrix can monitor it.

The code to be isolated is wrapped inside the run() method of a HystrixCommand similar to the following:

import com.netflix.hystrix.HystrixCommand;

 

public class HelloServiceGetCommand extends HystrixCommand<HelloResult> {
 


private final HttpGet httpGet;


    public HelloServiceGetCommand(final HttpGet httpGet) {


     super(HystrixCommandGroupKey.Factory.asKey("HelloGroup"));
 
     LOG.debug("Is CC breaker open " + isCircuitBreakerOpen());
      this.httpGet = httpGet;
    }
 
 
    @Override
    protected HelloResult run() throws IOException {

       //your logic goes here
        CloseableHttpClient httpClient = HttpClientBuilder.create().build();


        LOG.debug("Health count : TotalRequests " + metrics.getHealthCounts().getTotalRequests());        

               //call httpClient.execute

 

               //catch any error and populate HelloResult object

               return helloResult;

 

}

Fallback

To handle the failure of external services, Hystrix has built in the following defaults:

  1. Timeout for every request to an external system (default: 1000 ms)
  2. Limit of concurrent requests for external system (default: 10)
  3. The Circuit Breaker to avoid further requests (default: when more than 50% of all requests fail)
  4. Retry of a single request after the Circuit Breaker has triggered (default: every 5 seconds)
  5. Interfaces to retrieve runtime information at the request and aggregate level (there’s even a ready-to-use realtime dashboard for it) * Yet to be defined in OSGi.
    How-To-Use#Fallback

Simple Fallback method using Fallback: Stubbed pattern:

@Override
protected HelloResult getFallback() {
 


         LOG.debug("FALLBACK : is CC breaker open {} isResponseTimedOut() {}             isResponseTimedOut() {}",  isCircuitBreakerOpen(), isResponseTimedOut(),isResponseThreadPoolRejected());

LOG.debug("Health count : TotalRequests {} ErrorPercentage {} ErrorCount {}", metrics.getHealthCounts().getTotalRequests()
            , metrics.getHealthCounts().getErrorPercentage()
,metrics.getHealthCounts().getErrorCount());

 

// returns error object to service to send it to FE


return getHelloResultError();
}

The fallback method returns the error code which is then consumed by a UI component.

How to Run Hystrix Command

There are many ways to run the command. Following simple call is triggered from HelloServiceImpl to invoke Hystrix command.

public class HelloServiceImpl implements HelloService {

 

  private callCommand(){

    new HelloServiceGetCommand(getRequest).execute();

   //other service logic goes here

 }

}

Hystrix Runtime Configuration

Configuring a Hystrix command details can be found here: Hystrix Configuration. It is simple to update the configuration.

For example, the default value for circuitBreaker.requestVolumeThreshold is set to 20. We override the property using HystrixCommandProperties.Setter, as shown below.

public HelloServiceGetCommand(final HttpGet httpGet) {
    super(Setter
            .withGroupKey(HystrixCommandGroupKey.Factory.asKey("HelloGroup "))
            .andCommandPropertiesDefaults(
                    HystrixCommandProperties.Setter()
                       .withCircuitBreakerRequestVolumeThreshold(MyAudiConstants.CB_REQUEST_VOLUME_THRESHOLD)
                            .withCircuitBreakerErrorThresholdPercentage(MyAudiConstants.
                                    CB_REQUEST_ERROR_THRESHOLD)).
                    andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter().
                            withCoreSize(MyAudiConstants.CB_REQUEST_THREAD_POOL_SIZE)).
                    andCommandKey(HystrixCommandKey.Factory.asKey("HelloGroup ")));

 

…

}

Monitoring

A dashboard for monitoring applications using Hystrix is available in the hystrix-dashboard module. However, hystrix-dashboard has not been deployed to our AEM instance at this time.

1) Circuit Breaker is close at the start

DEBUG [hystrix-HelloGroup2] com.akqa…services.commands.HelloServiceGetCommand CC breaker open false

All HelloCommand requests are going through.

2) Now FAILURE occurs

DEBUG [hystrix-HelloGroup-2] com.akqa…services.commands.HelloServiceGetCommand CC breaker open True Events[SHORT_CIRCUITED]

3) Lastly, CB is closed once host is back online.

DEBUG [hystrix-HelloGroup-2] com….services.commands.HelloServiceGetCommand CC breaker open false

The example above just scratches the surface of how to improve the Service Resilience in a Felix container using Hystrix. The following resources can provide more advanced tricks to help make your application more fault tolerant.

https://github.com/Netflix/Hystrix/wiki/How-To-Use#Fallback

https://github.com/Netflix/Hystrix/wiki/How-To-Use#Common-Patterns

https://github.com/Netflix/Hystrix/wiki/Configuration

https://github.com/Netflix/Hystrix/wiki/Metrics-and-Monitoring

Summary

As demonstrated, it is possible to use the state-of–the-art, industry standard fault-tolerance library Hystrix in AEM to protect your service against cascading failures and to provide fallback behavior for potentially failing calls.

 

All opinions expressed by Yogesh Kulkarni and are his own and not Adobe’s.