This guide walks you through the process of creating a Spring Hadoop YARN application.

What you’ll build

You’ll build a simple Hadoop YARN application with Spring Hadoop and Spring Boot. In other examples you may have seen a multi-project structure to be used but by all means this is not necessary and in this sample only a single project and jar file is created.

What you’ll need

How to complete this guide

Like most Spring Getting Started guides, you can start from scratch and complete each step, or you can bypass basic setup steps that are already familiar to you. Either way, you end up with working code.

To start from scratch, move on to Set up the project.

To skip the basics, do the following:

When you’re finished, you can check your results against the code in gs-yarn-basic-single/complete.

Set up the project

First you set up a basic build script. You can use any build system you like when building apps with Spring, but the code you need to work with Gradle and Maven is included here. If you’re not familiar with either, refer to Building Java Projects with Gradle or Building Java Projects with Maven.

We also have additional guides having specific instructions using build systems with Spring YARN. If you’re not familiar with either, refer to Building Spring YARN Projects with Gradle or Building Spring YARN Projects with Maven.

Create the directory structure

In a project directory of your choosing, create the following directory structure:

└── src
    └── main
        ├── resources
        └── java
            └── hello

for example, on *nix systems, with:

mkdir -p src/main/resources
mkdir -p src/main/java/hello

Create the Gradle build files

Below is the initial Gradle build file and the initial Gradle settings file. But you can also use Maven. The pom.xml file is included right here. If you are using Spring Tool Suite (STS), you can import the guide directly.

build.gradle

buildscript {
    repositories {
        maven { url "http://repo.spring.io/libs-release" }
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.3.3.RELEASE")
    }
}

apply plugin: 'base'
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'spring-boot'
version =  '0.1.0'
archivesBaseName = 'gs-yarn-basic-single'

repositories {
    mavenCentral()
    maven { url "http://repo.spring.io/libs-release" }
}

dependencies {
    compile("org.springframework.data:spring-yarn-boot:2.4.0.RELEASE")
    testCompile("org.springframework.data:spring-yarn-boot-test:2.4.0.RELEASE")
    testCompile("org.hamcrest:hamcrest-core:1.2.1")
    testCompile("org.hamcrest:hamcrest-library:1.2.1")
}

task copyJars(type: Copy) {
    from "$buildDir/libs"
    into "$rootDir/target/"
    include "**/*.jar"
}

assemble.doLast {copyJars.execute()}
clean.doLast {ant.delete(dir: "target")}

task wrapper(type: Wrapper) {
    gradleVersion = '1.11'
}

settings.gradle

Unresolved directive in <stdin> - include::initial/settings.gradle[]

In the above gradle build file we simply create three different jars, each having classes for its specific role. These jars are then repackaged by Spring Boot’s gradle plugin to create an executable jar.

Create an Application

Here you create Application and HelloPojo classes.

src/main/java/hello/Application.java

package hello;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Profile;
import org.springframework.data.hadoop.fs.FsShell;
import org.springframework.yarn.annotation.OnContainerStart;
import org.springframework.yarn.annotation.YarnComponent;

@ComponentScan
@EnableAutoConfiguration
public class Application {

	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}

	@YarnComponent
	@Profile("container")
	public static class HelloPojo {

		private static final Log log = LogFactory.getLog(HelloPojo.class);

		@Autowired
		private Configuration configuration;

		@OnContainerStart
		public void onStart() throws Exception {
			log.info("Hello from HelloPojo");
			log.info("About to list from hdfs root content");

			FsShell shell = new FsShell(configuration);
			for (FileStatus s : shell.ls(false, "/")) {
				log.info(s);
			}
			shell.close();
		}

	}

}

In the above Application, notice how we added the @ComponentScan annotation at the main class level and the @YarnComponent annotation on the inner HelloPojo class.

HelloPojo class is a simple POJO in a sense that it doesn’t extend any Spring YARN base classes. What we did in this class:

  • We added a class level @YarnComponent annotation.

  • We added a method level @OnContainerStart annotation

  • We @Autowired a Hadoop’s Configuration class

  • We added a method level @Profile annotation

@YarnComponent is a stereotype annotation, providing a Spring @Component annotation. This is automatically marking a class to be a candidate for having @YarnComponent functionality. We specifically use @Profile to mark bean to be created only if container profile is active. Having a @ComponentScan present in Application class will then instruct context to automatically create beans by classpath scanning.

Within this class we can use @OnContainerStart annotation to mark a public method with void return type and no arguments act as an entry point for some application code that needs to be executed on Hadoop.

To demonstrate that we actually have some real functionality in this class, we simply use Spring Hadoop’s @FsShell to list entries from the root of the HDFS file system. We needed to have Hadoop’s Configuration which is prepared for you so that you can just rely on autowiring for access to it.

The main() method uses Spring Boot’s SpringApplication.run() method to launch an application. What happens next depends on configuration and detected condition on YarnClient, YarnAppmaster or YarnContainer.

Create an Application Configuration

Create a new yaml configuration file for gs-yarn-basic-single-app project.

src/main/resources/application.yml

spring:
    hadoop:
        fsUri: hdfs://localhost:8020
        resourceManagerHost: localhost
    yarn:
        appName: gs-yarn-basic-single
        applicationDir: /app/gs-yarn-basic-single/
        client:
            startup:
                action: submit
            localizer:
                patterns:
                  - "*.jar"
            files:
              - "file:target/gs-yarn-basic-single-0.1.0.jar"
            launchcontext:
                archiveFile: gs-yarn-basic-single-0.1.0.jar
        appmaster:
            localizer:
                patterns:
                  - "*.jar"
            containerCount: 1
            launchcontext:
                archiveFile: gs-yarn-basic-single-0.1.0.jar
                arguments:
                    --spring.profiles.active: container
Pay attention to the yaml file format which expects correct indentation and no tab characters.

Final part for your application is its runtime configuration, which glues all the components together, which then can be executed as a Spring YARN application. This configuration act as source for Spring Boot’s @ConfigurationProperties and contains relevant configuration properties which cannot be auto-discovered or otherwise needs to have an option to be overwritten by an end user.

This way you can define your own defaults for your environment. Because these @ConfigurationProperties are resolved at runtime by Spring Boot, you even have an easy option to overwrite these properties either by using command-line options, environment variables or by providing additional configuration property files.

Build the Application

For gradle simply execute the clean and build tasks.

./gradlew clean build

To skip existing tests if any:

./gradlew clean build -x test

For maven simply execute the clean and package goals.

mvn clean package

To skip existing tests if any:

mvn clean package -DskipTests=true

Below listing shows files after a succesfull gradle build.

target/gs-yarn-basic-single-0.1.0.jar

Run the Application

Now that you’ve successfully compiled and packaged your application, it’s time to do the fun part and execute it on Hadoop YARN.

To accomplish this, simply run your executable client jar from the projects root dirctory.

$ java -jar target/gs-yarn-basic-single-0.1.0.jar

To find Hadoop’s application logs, you need to do a simple find within the hadoop clusters configured userlogs directory.

$ find hadoop/logs/userlogs/ | grep std
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000001/Appmaster.stdout
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000001/Appmaster.stderr
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stdout
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stderr

Grep logging output from a HelloPojo class.

$ grep HelloPojo hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stdout
[2014-03-23 12:42:05.763] boot - 17064  INFO [main] --- HelloPojo: Hello from HelloPojo
[2014-03-23 12:42:05.763] boot - 17064  INFO [main] --- HelloPojo: About to list from hdfs root content
[2014-03-23 12:42:06.745] boot - 17064  INFO [main] --- HelloPojo: FileStatus{path=hdfs://localhost:8020/; isDirectory=true; modification_time=1395397562421; access_time=0; owner=root;
group=supergroup; permission=rwxr-xr-x; isSymlink=false}
[2014-03-23 12:42:06.746] boot - 17064  INFO [main] --- HelloPojo:
FileStatus{path=hdfs://localhost:8020/app; isDirectory=true;
modification_time=1395501405412; access_time=0; owner=hadoop; group=supergroup; permission=rwxr-xr-x; isSymlink=false}

Summary

Congratulations! You’ve just developed a Spring YARN application!

Want to write a new guide or contribute to an existing one? Check out our contribution guidelines.

All guides are released with an ASLv2 license for the code, and an Attribution, NoDerivatives creative commons license for the writing.