Spring for Apache Hadoop

2.5.0

NOTICE: The Spring for Apache Hadoop project will reach End-Of-Life status on April 5th, 2019. We will publish occasional 2.5.x maintenance releases as needed up until that point and will then move the project to the attic. The current Spring for Apache Hadoop 2.5.0 release is built using Apache Hadoop version 2.7.3 and should be compatible with the latest releases of the most popular Hadoop distributions.

Introduction

Spring for Apache Hadoop simplifies developing Apache Hadoop by providing a unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, and Hive. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration.

Check out the book from O’Reilly Media Spring Data: Modern Data Access for Enterprise Java that contains several chapters on using Spring for Apache Hadoop. Sample code for the book is also available in the GitHub project spring-data-book.

Features

  • Support to create Hadoop applications that are configured using Dependency Injection and run as standard Java applications vs. using Hadoop command line utilities.

  • Integration with Spring Boot to simply creat Spring apps that connect to HDFS to read and write data.

  • Create and configure applications that use Java MapReduce, Streaming, Hive, Pig, or HBase

  • Extensions to Spring Batch to support creating Hadoop based workflows for any type of Hadoop Job or HDFS operation.

  • Script HDFS operations using any JVM based scripting language.

  • Easily create custom Spring Boot based aplications that can be deployed to execute on YARN.

  • DAO support (Template & Callbacks) for HBase.

  • Support for Hadoop Security.

Versions and Distribution Support

Spring for Apache Hadoop supports a number of Apache releases as well as commercial distributions from Pivotal, Hortonworks and Cloudera.

The supported distros varies by release version, see wiki page for details.

Also, see the wiki page for Maven build details.

The continuous integration builds for most supported versions can be seen on the build page.

Spring Boot Config

<dependencies>
    <dependency>
        <groupId>org.springframework.data</groupId>
        <artifactId>spring-data-hadoop</artifactId>
        <version>2.5.0.RELEASE</version>
    </dependency>
</dependencies>

Quick start

Bootstrap your application with Spring Initializr.

Documentation

Each Spring project has its own; it explains in great details how you can use project features and what you can achieve with them.
2.5.0 CURRENT GA Reference Doc. API Doc.
2.5.1 SNAPSHOT Reference Doc. API Doc.

A few examples to try out: