Get ahead
VMware offers training and certification to turbo-charge your progress.
Learn moreIn this article we outline the main themes of Spring Batch 2.0, and highlight the changes from 1.x. Development work on the new release is well underway, with an M2 release last week, and we are getting a lot of interest, so now seems like a good time to give a few pointers.
The four main themes of the new release are
There are no changes to the physical layout of the project in Spring Batch 2.0.0.M2 (same old downloads, same basic layout of Java packages). We have not removed any features, but we have taken the opportunity to revise a couple of APIs, and there are some minor changes for people updating projects from 1.x. Spring Batch is immature enough and we were adding some pretty big features, so we decided a major version change was a good opportunity to have a bit of a clean out. We don't expect anyone to have any difficulty upgrading, and if you are an existing user this article will help you to get the measure of the changes.
As you may know, Spring 3.0 is going to be the first major release of Spring to target Java 5 exclusively (I'll leave it to Juergen and Arjen to clarify that in more detail). Now that Sun has put an "End of Service Life" stamp on the JDK 1.4 it seems appropriate, and there are some great new features on Spring 3.0 that we want to take advantage of.
public interface ItemReader<S> {
S read();
}
A further point to note here is that the old (1.x) framework callbacks mark() and reset() have gone from this interface, making it more friendly to end users, and preventing misunderstanding about what the framework requires and when. The same concerns (or mark and reset) are now handled internally in the Step implementations that the framework provides.
Similar changes, and one slightly more radical, have also occurred in the partner ItemWriter interface, used by the framework for writing data:
public interface ItemWriter<T> {
void write(List<? extends T> items);
}
The old framework callbacks for flush() and clear() have gone from this interface too, and to compensate for that, the write() method has a new signature. The bottom line here is that we have moved to a chunk-oriented processing paradigm internally to the framework. This is actually much more natural in a batch framework than the old item-oriented approach because for performance reasons we often need to buffer and flush, and the old interface made that awkward for users. Now you can do all the batching you need inside the write() method.
public interface ItemProcessor<S,T> {
T process(S item);
}
In 1.x the transformation between input items of type S and output items of type T had to be hidden inside one of the other participants (usually the ItemWriter). Now we have genericised this concern and placed it at the same level of importance in the framework as its siblings, ItemReader and ItemWriter. Users of 1.x might recognise the traces of the old ItemTransformer interface here, which has now been removed.
Many people, looking at Spring Batch 1.x, have asked "what if my business logic is not reading and writing?" To answer this question more satisfactorily we have modified the Tasklet interface. In Spring Batch 1.x it is fairly bland - little more than a Callable in fact, but in 2.0 we have given it some more flexibility and slotted it more into the mainstream of the framework (e.g. the central chunk-oriented step implementation is now implemented as a Tasklet). Here is the new interface:
public interface Tasklet {
ExitStatus execute(StepContribution contribution,
AttributeAccessor attributes);
}
The idea is that the tasklet can now contribute more back to the enclosing step, and this makes it a much more flexible platform for implementing business logic. The StepContribution was already part of the 1.x API, but it wasn't very publicly exposed. Its role is to collect updates to the current StepExecution without the programmer having to worry about concurrent modifications in another thread. This also tells us that the Tasklet will be called repeatedly (instead of just once per step in the 1.x framework), and so it can be used to carry out a greater range of business processing tasks. The AttributeAccessor is a chunk-scoped bag of key-value pairs. The tasklet can use this to store intermediate results that will be preserved across a rollback.
<bean id="step1" parent="simpleStep">
<property name="itemReader">
<bean class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property value="#{jobParameters[inputFile]}" />
...
</bean>
</property>
...
</bean>
The item reader needs to be bound to a file name that is only available at runtime. To do this we have declared it as scope="step" and used the Spring EL binding pattern #{...} to bind in a job parameter. The same pattern will work with step and job level execution context attributes.
The StepScope is available in nightly snapshots, but we are still using Spring 2.5.5, so the EL bindings are not yet working. Step scoped beans are also a good solution to the old Spring Batch problem of how to keep your steps thread safe. If a step relies on a stateful component like a FlatFileItemReader, then it only has to define that component as scope="step" and the framework will create a lazy initializing proxy for it, and it will be created as needed once per step execution. There's still noting wrong with creating a new ApplicationContext for each job execution, which is the current best practice in 1.x for keeping threads from colliding across jobs. But now step scope gives you another option, and one that is probably easier for most Spring users to get to grips with.
In 1.x the model of a job was always as a linear sequence of steps, and if one step failed, then the job failed. Although many jobs still fit that pattern so it hasn't gone away, in 2.0 we are lifting that restriction by introducing some new features. These are planned for the M3 release so the implementation details might change, but the idea is to support three features:
Spring Batch 1.x was always intended as a single VM, possibly multi-threaded model, but we built a lot of features into it that support parallel execution in multiple processes. Many projects have successfully implemented a scalable solution relying on the quality of service features of Spring Batch to ensure that processing only happens in the correct sequence. In 2.0 we plan to expose those features more explicitly. There are two approaches to scalability, and we will support both: remote chunking, and partitioning.
The idea behind using annotations to implement batch logic is by analogy with Spring @MVC. The net effect is that instead of having to implement and possibly register a bunch of interfaces (reader, writer, processor, listeners, etc.) you would just annotate a POJO and plug it into a step. There are method level annotations corresponding to the various interfaces, and parameter level annotations and factory methods corresponding to job, step and chunk level attributes (a bit like @ModelParameter and @RequestParameter in Spring @MVC.
A XML namespace for Spring Batch makes configuration of common things even easier. For example, the ConditionalJob we mentioned above has an XML configuration option, so a simple conditional execution might look like this:
<job>
<step name="gamesLoad" next="playerLoad"/>
<step name="playerLoad">
<next on="*" to="summarize"/>
<next on="FAILED" to="compensate"/>
</step>
<step name="compensate"/>
<step name="summarize"/>
</job>
The first step ("gamesLoad") is followed immediately by "playerLoad", but if that fails, then we can go to an alternative step ("compensate") instead of finishing the job normally with the "summarize" step. The name= attribute of the <step element in the XML is a bean reference, so the implementation of Step is defined elsewhere (possibly through annotations).
There are a couple of tidying up tasks and extensions to the data model in the meta data schema. We will be providing update scripts for anyone moving from 1.x to 2.0, so they shouldn't cause any problems, and will definitely make it easier to navigate and interact with the meta data. For those of you who are new to Spring Batch, some of the key benefits of the framework are the quality of service features like restartability and idempotence (process business data once and only once). We implement these features through shared state in a relational database (in most use cases), and the definition of the data model in this database has changed slightly in 2.0.
The main changes are to do with the storage of ExecutionContext, which used to be centralised in one table, even though the context can be associated either with a StepExecution or a JobExecution. The new model will be more popular with DBAs because it makes the relationships more transparent in the DDL. We also started storing the context values in JSON, to make them easier to read and track for human users (the context for a single entity is all stored in one row in a table, instead of many).
We have also added some more statistics for the counting and accounting of items executed and skipped, splitting out counts for total items read, processed and written at each stage. For steps (or tasklets) that do not split their execution into read, process, write, this is more comprehensive than is needed, but for the majority use case it is more appropriate than just storing an overall item count.
Hopefully this article has whetted your appetite for Spring Batch 2.0, and you will have time to download a snapshot or milestone and try it out. We get such good feedback from the community that the product wouldn't be half as good as it is without your help and support. If I have missed something that you were interested in, or haven't provided enough detail I apologise, but I needed to squeeze this into a reasonable blog-sized format. I can write more if anyone is interested - just ask.
If you do want to know more, there is an excellent opportunity to hear about Spring Batch 2.0 and all the other SpringSource activities from the team at this year's Spring One Americas. Lucas Ward will be presenting Spring Batch 2.0 features in one session and I will be showing the scalability features in a bit more detail. Also check out the discussions on the forum, or go to the home page here.