Today we are pleased to announce the 1.0 M1 release of Spring XD (download).Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. The project’s goal is to simplify the development of big data applications.
From the 10,000 foot view, big data applications share many characteristics with Enterprise Integration and Batch applications. Spring has provided proven solutions for building integration and batch applications for more than 6 years now via the Spring Integration and Spring Batch projects. Spring XD builds upon this foundation and provides a lightweight runtime environment that is easily configured and assembled via a simple DSL.
In this blog we will introduce the key components of Spring XD, namely Streams, Jobs, Taps, Analytics and the DSL used to declare them, as well as the runtime architecture. Many more details can be found in the
XD Guide.
Streams
A Stream defines how data is collected, processed and stored or forwarded. For example, a stream may collect syslog data, filter it, and store it in HDFS. Spring XD provides a DSL to define a stream. The DSL allows you to start simple using a UNIX pipes-and-filters syntax to build a linear processing flow but lets you also describe more complex flows using an extended syntax.
Sources and Sinks
A simple linear stream consists of the sequence: Input Source, (optional) Processing Steps, and an Output Sink. As a simple example consider the collection of data from a HTTP Source writing to a File Sink. The DSL to describe this stream is
http | file
You tell Spring XD to create a stream by making a HTTP request to the XD Admin Server which runs on port 8080 by default. In the M2 release we will provide an interactive shell to communicate with XD, but for M1 the easiest way is to interact with XD is using ‘curl’.
curl -d "http | file" http://localhost:8080/streams/httptest
The name of the stream is httptest
, the default HTTP port to listen on is 9000
, and the default file location is /tmp/xd/output/${streamname}
.
If you post some data on port 9000 with curl
curl -d "hello world" http://localhost:9000
You will see the string hello world inside the file /tmp/xd/output/httptest
To change the default values, you can pass in option arguments
http --port=9090 | file --dir=/var/streams --name=data.txt
The supported sources in M1 are file, time, HTTP, Tail, Twitter Search, Gemfire (Continuous Queries), Gemfire (Cache Event), Syslog and TCP. The supported sinks are Log, File, HDFS…