close

Highlights of Spring for Apache Hadoop 1.0.0 M2

I am happy to announce that the second milestone (1.0.0.M2) of Spring for Apache Hadoop project is available. In this blog post, I would like to quickly highlight the major new features in M2.

HBase DAO support

One of the most versatile and powerful feature in Spring Framework is the Data Access Object (or DAO) support. With Spring for Hadoop 1.0.0 M2, the same functionality was added for HBase. Users of the popular template and callback pattern should feel right at home as the framework handles the table lookup, resource cleanup and exception conversion, letting the developer focus on what really matters. See the API and reference docs for more information. By the way, we also included a new sample in the distribution, hbase-crud, to help you get started right away.

Cascading Taps

In M2, we have expanded the integration with Cascading library by <a href="http://static.springsource.org/spring-hadoop/docs/1.0.0.M2/reference/html/cascading.html#cascading:tap:localintroducing dedicated Taps for Spring Framework and Spring Integration resources. The richness of Spring Integration adapters (whether inbound or outbound) such as File, TCP, Twitter, FTP, RSS (just to name a few) is now available to Cascading (and its extensions such as Cascalog or Scalding). And we are just getting started - expect more news on this front.

Hadoop Security

With M2, moving from a vanilla Hadoop install (such as a dev machine) to a fully Kerberos-secured Hadoop cluster is transparent. The File-System, Map/Reduce and Pig components are all security-aware, executing under proper credentials and supporting user impersonation. See the dedicated chapter for more information.

Enhanced vanilla Map/Reduce support

Since the beginning, Spring for Apache Hadoop offered extensive support for Map/Reduce jobs - whether it is vanilla or traditional Java Map/Reduce, streaming or tooling. In M2, we have added support for Hadoop generic options across the board, making job provisioning, either by naming resources individually or through pattern matching, a one-liner.
Further more, we have enhanced the bootstrapping of jar-based jobs - rather then requiring the classes to be on the classpath, the job can be fully loaded, in isolation, from the jar. The classes (and their dependencies) do not leak into the application which avoids all sorts of versioning conflicts and dependency creep. The tool declaration has been improved to automatically read the Jar metadata and its Main-Class, offering a powerful, fully managed replacement to Hadoop shell jar invocations.

Two New Samples

Last but not least, two new samples have been added to the distribution: hbase-crud, which I mentioned before showcasing the declarative and programmatic HBase support and pig-scripting, demoing the JVM and Pig scripting: the former doing data preparations in HDFS for the latter, which does data analysis. There are more samples in the pipeline and if you would like to see anything in particular, tell us.

I hope you enjoy this new milestone.
Go ahead, grab 1.0.0 M2, take it for a spin and let us know what you think!

Other News: Project Serengeti

As far as new releases go, Spring for Apache Hadoop 1.0.0 M2 is not the only news on the Hadoop front. Today, VMware takes the curtains off project Serengeti, for virtualized and Highly Available Hadoop. See Richard McDougall’s blog post on the motivations behind it, the current status and road-map.

comments powered by Disqus