Get ahead
VMware offers training and certification to turbo-charge your progress.
Learn moreAt The Spring Experience, I hosted a session various aspects. One of them was the Hibernate synchronization aspect that I described last week. Another was an aspect capable of capturing first failures and system state, sometimes called First-Failure Data Capture (FFDC). I hosted this session to show off some aspects that are very useful, but that people might not have come across in practice yet. I often hear people asking about aspects other than logging, tracing, transaction management and security. The Hibernate synchronization aspect and the FFDC aspect are nice examples I think.
Let's set the following two goals:
In order to do this, first let's devise a class that is capable is keeping track of the data for us that we want to record. We will call this class the CallContext. I'm omitting the actual implementation. I'll post the code in my next entry, the implementation does not really matter here and other than that, it's a pretty straightforward data holder anyway.
public class CallContext {
/**
* Registers the root call of this call context.
* We want to distinguish between the root call
* and all subsequent calls issued in the context
* of the root call.
*/
public void setRootCall(JoinPoint rootCall) { ... }
/**
* Registers a call at a certain depth.
* @param the call to register
* @param the depth of the call
*/
public void registerCall(JoinPoint call, int depth) { ... }
/**
* Registers the first failure of this call context.
* A first failure might already have occurred in which
* case subsequent registrations of the same or different
* failures will be ignored.
*/
public void setFirstFailure(Throwable t) { ... }
/**
* Log the entire call context (i.e. output it to
* System.out).
*/
public void log(Throwable t) { ... }
}
As you can see, we are using the AspectJ JoinPoint type to identify an event happening in a program.
As you can see, I'm just slicing and dicing things a little in such a way that sentences in the form of 'before/after something happens, do something' start appearing. The only thing left to do is to identify the two somethings and we're done. Let's tackle each of the three different pieces of logic individually.
pointcut businessService() : call(* (@BusinessService *..*).*(..)) || call(@BusinessService * *(..));
Now we have identified a business service, we can finish the first scenario:
public aspect FirstFailureDataCapturer {
public CallContext callContext = new CallContext();
pointcut businessService() : call(@BusinessService *..*).*(..)) ||
call(@BusinessService * *(..));
before() : businessService() {
// 'thisJoinPoint' is an implicit variable (just like 'this')
// that represents the current join point
this.callContext.setRootCall(thisJoinPoint);
}
}
pointcut methodCall() : call(* *(..));
If we'd use this pointcut, we'd make the scenario applicable to all methods, but we only want to apply it to methods inside business services. Therefore we need to limit the scope of this pointcut. We can do that by using the cflow pointcut designator. The cflow pointcut designator takes another pointcut and limits to the things happening in the context of that pointcut. Let's see how we can use it to solve the problem at hand. Read the following as: 'a method call in a business service is a method call (refer to pointcut defined above) while (and) in the control flow of a business service (refer to other pointcut defined earlier'.
pointcut methodCallInBusinessService() : methodCall() && cflow(businessService());
Let's go one step further and say that we do not want to register all method calls, but only a limited set. The following defines a traceable method, identifying only methods that I thought are relevant. It also excludes methods defined in the aspect itself (or in the control flow of the aspect). The latter prevents endless loops from occurring. Let's read this out aloud as well: a traceable method is a method call in a business service (refer to pointcut defined above) while not (and not) in the control flow of an executing advice defined in the FirstFailureDataCapturer and it should also not be a call to equals(), hashCode() or getClass().
pointcut traceableMethod() :
methodCallInBusinessService() &&
!cflow(within(FirstFailureDataCapturer) && adviceexecution()) &&
!call(* equals(Object)) && !call(* hashCode()) && !call(* getClass());
Let's use this pointcut to implement the second scenario that we've identified. In the description of the scenario above, we have not specified that we also need to keep track of the current depth. We're using a before advice to log the current call. Let's also use that same advice to keep track of the depth and let's use an after advice to reset the depth to it's previous state.
public aspect FirstFailureDataCapturer {
public CallContext callContext = new CallContext();
public int currentDepth = 0;
// other pointcuts and advices omitted
pointcut methodCallInBusinessService() : methodCall() && cflow(businessService());
pointcut traceableMethod() :
methodCallInBusinessService() &&
!cflow(within(FirstFailureDataCapturer) && adviceexecution())) &&
!call(* equals(Object)) && !call(* hashCode()) && !call(* getClass());
before() : traceableMethod() {
currentDepth++;
callContext.registerCall(thisJoinPoint, currentDepth);
}
after() : traceableMethod() {
currentDepth--;
}
}
A potential failure point is a) an exception escaping from a method or b) an exception handler inside a method (and then wrapped, swallowed, possibly re-thrown, et cetera). Let's use this definition to implement our third scenario. The first pointcut just uses the traceable method pointcut to identify potential failure points. We'll use after throwing advice in a minute to finish part of our scenario. The second is a little more interesting. It defines a pointcut that identifies exception handler (catch blocks) in the control flow of business services. Using this pointcut, we can identify excpetion that are caught, wrapped and re-thrown for example (or caught and swallowed).
pointcut potentialFailurePoint() : traceableMethod();
pointcut exceptionHandler(Throwable t) : handler(*) && args(t) && cflow(businessService());
We'll use after before and after advice to finish the third scenario. First: before an exception handler, log the exception:
public aspect FirstFailureDataCapturer {
private CallContext context = new CallContext();
// other members omitted
before(Throwable t) : exceptionHandler(t) {
this.callContext.setFirstFailure(t);
}
}
And now, let's define the other advice:
public aspect FirstFailureDataCapturer {
private CallContext context = new CallContext();
// other members omitted
after() throwing(Throwable t) : potentialFailurePoint() {
this.callContext.setFirstFailure(t);
}
}
public aspect FirstFailureDataCapturer {
private CallContext context = new CallContext();
// other members omitted
after() throwing(Throwable t) : businessService() {
this.callContext.log(t);
}
}
@BusinessService public Car manufactureCar(CarModel model) {
Set <Part> parts = inventory.getPartsForModel(model);
return assemblyLine.assembleCarFromParts(model, parts);
}
The CarPartsInventory in this example is a stub and does not really do anything useful:
public Set<Part> getPartsForModel(CarModel model) {
return new HashSet<Part>();
}
The interesting bit here is the CarAssemblyLine. As you can see in the code below, the CarAssemblyLine has some strange code in it. It first throws an exception, catches it itself and then re-throws it as a pretty MeaninglessException.
public Car assembleCarFromParts(CarModel model, Set<Part> parts) {
try {
throw new OnStrikeException("The workers are on strike!");
} catch (OnStrikeException e) {
throw new MeaninglessException();
}
}
Obviously, under normal circumstances, the real cause of the problem, the root cause would never be identified in this case (it's caught, not logged... a different exception is thrown and the root cause is not passed along), and we'd also never be able to register the system state exactly at the point the real and first failure (the OnStrikeException) occurred. Fortunately now we have the FirstFailureDataCapturer, we can register the root cause and log it. Below you can find a sequence diagram and some output of a test that I ran. As you can see, we do not just get the call stack, but all calls that occurred in the context of this business service execution, or in other words: the entire call tree.
Now we have identified the real point where failure occur, we can also start capturing system state. As you can imagine, the system state at CarPlant:18 might differ from the system state at CarPlant:20 and our FirstFailureDataCapturer allows us to register the system state at the correct point in time.
So what is system state anyway? Well, that all depends on the runtime, your specific application and what you are interested in. A couple of examples:
Capturing system state is now very easy and we can do this inside the CallContext.setFirstFailure() method for example.
public aspect FirstFailureDataCapturer {
public CallContext callContext = new CallContext();
pointcut businessService() : call(@BusinessService * *(..)) || call(* (@BusinessService *..*).*(..)) || call(@BusinessService * *(..));
before() : businessService() {
// 'thisJoinPoint' is an implicit variable (just like 'this')
// that represents the current join point
this.callContext.setRootCall(thisJoinPoint);
}
}
As you can see, the call context is instantiated when the FirstFailureDataCapturer is instantiated. The question of course now is: when and how many times will the FirstFailureDataCapturer be instantiated? And, when you've answered this question, another question might come to mind: what happens if this aspect is used in a multi-threaded environment? In the next part I'll discuss all of this and make some other changes to the aspect to polish it a bit. In the meantime, you can always try to answer those questions in the comments of course :)! I'll also provide the source code for the aspect in the next part.