Get ahead
VMware offers training and certification to turbo-charge your progress.
Learn moreThe Spring XD team is pleased to announce that Spring XD Milestone 7 is now available for download.
Highlights of this release
Transport Data Partitioning: By default, messages are delivered to multiple instances of a stream module in a round-robin manner. However, if a module performs operations such that it can not consume random messages from the stream, then you can partition the stream based on its content so that similar messages are always delivered to the same module instance. For example, if a processing module is performing stateful operations on a per-customer basis, the stream can be partitioned based on the customerId field in the message. This is done in by specifying partition properties in the deployment manifest. A small example is shown below.
HDFS and HDFS DataSet Sink improvements: These sinks now support writing to multiple paths and files based on partition functions. Look at the HDFS Partitioning Samples for several ways to use the partitioning features.
Update to support newer Hadoop Distributions: Now 8 in total.
Configurable options for the Rabbit Message Bus: Configure options such as message delivery options, concurrency settings, and High Availability policy. These options can also be overridden for a specific module, e.g. modue.http.producer.deliveryMode=NON_PERSISTENT
Improved module coverage in automated system tests
Data Partitioning Example
To demonstrate the data partitioning functionality, start two containers using Rabbit as the transport. In the shell
stream create words --definition "http | splitter --expression=payload.split(' ') | log"
stream deploy words --properties module.splitter.producer.partitionKeyExpression=payload,module.log.count=2
http post --data "How much wood would a woodchuck chuck if a woodchuck could chuck wood"
In one container log you will see
16:33:27,486 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - How
16:33:27,507 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck
16:33:27,508 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck
and in the other
16:33:27,503 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - much
16:33:27,512 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood
16:33:27,513 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - would
16:33:27,514 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a
16:33:27,520 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck
16:33:27,522 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - if
16:33:27,523 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a
16:33:27,524 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck
16:33:27,526 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - could
16:33:27,528 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood
This shows that messages that contain the same word are directed to the same container instance.
Note that partitioning is only supported when using RabbitMQ as a transport. Support for Redis as a transport will be available in the next release
Wrapping up
You can also install Spring XD on OSX using homebrew and on RHEL/CentOs using yum.
The Spring XD project home is the central hub for learning more about Spring XD. Some useful links are the reference docs, sample applications, and QCon SF 2013 Session Replay: Introducing Spring XD.
We look forward to your comments and feedback:
spring-xd
tagSpringOne 2GX 2014 is around the corner
Book your place at SpringOne in Dallas, TX for Sept 8-11 soon. It's simply the best opportunity to find out first hand all that's going on and to provide direct feedback. There will be deep dive sessions on Spring XD along with general Big Data talks to provide an introduction to the landscape and challenges in developing Big Data applications.