Mark Pollack

Mark Pollack

Mark Pollack is a software engineer with Pivotal and is the lead of the Spring Cloud Data Flow project. He has been a contributor to many Spring projects dating back to the Spring Framework in 2003 as well as founding the Spring.NET and Spring Data projects.

Recent Blog posts by Mark Pollack

Spring XD 1.0.1 released

Releases | October 02, 2014 | ...

On behalf of the Spring XD team, I am very pleased to announce the general availability of Spring XD 1.0.1!

This release includes bug fixes and enhancements as well as some new features:

You can download the zip distribution or install on OSX using homebrew. On RHEL/CentOs you can install using yum.

Feedback is very important, so please get in touch with questions and comments via

Spring XD 1.0 GA Released

Releases | July 30, 2014 | ...

On behalf of the Spring XD team, I am very pleased to announce the general availability of Spring XD 1.0! You can download the zip distribution. You can also install on OSX using homebrew and on RHEL/CentOs using yum.

Spring XD's goal is to be your one stop shop for developing and deploying Big Data Applications. Such applications require a wide range of technologies to address different use-cases while interoperating as a cohesive process. The steps in this process include:

  • Data collection
  • Real-time streaming and analytics
  • Data cleansing
  • Batch processing (both on and off Hadoop)
  • Machine learning and exploratory data analysis
  • Visualization and Reporting
  • Closed loop analytics between real-time and batch processing

Spring XD 1.0.0.RC1 Released

Releases | July 18, 2014 | ...

The Spring XD team is pleased to announce that Spring XD Release Candidate 1 is now available for download. You can also install Spring XD on OSX using homebrew and on RHEL/CentOs using yum.

Highlights of this release

  • Direct binding: Deployments can be configured to avoid modules sending data over the Message Bus if they are co-located in the same container. Using this option increases throughput and lowers latency but can not be applied to all deployment topologies.
  • Stream Deployment State: The state of stream is calculated throughout the lifetime of the deployment. For example, if a subset of the modules that comprise a stream have failed, the overall state of the stream changes from Deployed to Incomplete. Once the failures have been addressed, the state of the stream returns to Deployed.
  • Improved REST API

Spring XD 1.0.0.M7 Released

Releases | June 03, 2014 | ...

The Spring XD team is pleased to announce that Spring XD Milestone 7 is now available for download.

Highlights of this release

  • Transport Data Partitioning: By default, messages are delivered to multiple instances of a stream module in a round-robin manner. However, if a module performs operations such that it can not consume random messages from the stream, then you can partition the stream based on its content so that similar messages are always delivered to the same module instance. For example, if a processing module is performing stateful operations on a per-customer basis, the stream…

Spring XD 1.0.0.M6 Released

Releases | April 16, 2014 | ...

The Spring XD team is pleased to announce that Spring XD Milestone 6 is now available for download.

This is our biggest release yet! The team has been hard at work, and Milestone 6 contains a wealth of new features that meet enterprise requirements in terms of reliability, performance, and user experience. Below is a quick Top Ten (in no particular order), but if you checkout the release notes you will realize how difficult it is to pick out 10 from the list of 299.

  • Distributed and Fault Tolerant Runtime: Leader election among multiple xd-admin servers and automatic redeployment of modules to other xd-containers in the case of failure. ZooKeeper is introduced to manage the cluster and its deployment state.

  • Support for running XD on YARN: Run admin and container nodes on a Hadoop YARN cluster rather than on VMs or physical servers that you need to manage. There are simple configuration and shell scripts that make this process very easy.

  • Deployment Manifests: When deploying a stream you can provide a deployment manifest that describes how to transform the logical stream definition (e.g. http | hdfs) to a physical deployment on the cluster. You can specify the number of instances of each module to deploy and also a criteria expression (using SpEL) that evaluates each of the available containers in the cluster to determine the best matches for those module instances. This will be an area of active development for the next release as we extend the manifest to include support for data partitioning strategies.

Spring Shell 1.1 RC1 Released

Releases | April 03, 2014 | ...

We are pleased to announce the release of Spring Shell 1.1 RC1. The Spring Shell is an interactive shell that can be easily extended with commands using a Spring based programming model.

This is a small bug fix release but includes an important improvement, the upgrade to use the JLine2 library and rewrite of the command parser. Check the release notes for more information. Special thanks to Eric Bottard and to those who submitted pull-requests.

Downloads | JavaDocs | Reference Documentation | Changelog

Spring XD 1.0.0.M5 Released

Engineering | January 10, 2014 | ...

The Spring XD team is pleased to announce that Spring XD 1.0.0 Milestone 5 is now available for download.

Spring XD makes it easy to solve common big data problems such as data ingestion and export, real-time analytics, and batch workflow orchestration. This release includes several notable new features:

Spring XD 1.0 Milestone 2 Released

Releases | August 14, 2013 | ...

Today we are pleased to announce the 1.0 M2 release of Spring XD (download)  Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export.  The project’s goal is to simplify the development of big data applications.

The second milestone release of Spring XD introduces several new features that make it even easier to ingest and process real-time streams of data as well as orchestrate Hadoop based batch jobs.  In this blog post we will cover

  • Shell
  • New sources, sinks and transports
  • DSL improvements
  • Batch Jobs

Shell

The most noticeable new feature is the introduction of the interactive shell.  The shell provides you an easy way to create new streams and jobs, view metrics, interact with Hadoop, and more.  As an introduction to the shell I will redo some of the examples from the M1 blog post.

Start…

Spring Shell 1.1.0.M1 Released

Releases | July 26, 2013 | ...

Dear Spring Community,

I am pleased to announce the first milestone release Spring Shell 1.1. Spring Shell is an interactive shell that can be easily extended with commands using a Spring based programming model. This release adds support for testing of commands as well as several bug fixes and general improvements. Many thanks to to those who submitted pull-requests

Downloads | JavaDocs | Reference Documentation | Changelog

We look forward to your feedback on the forum or in the issue tracker.

Spring XD 1.0 Milestone 1 Released

Engineering | June 12, 2013 | ...

Today we are pleased to announce the 1.0 M1 release of Spring XD  (download).Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export.  The project’s goal is to simplify the development of big data applications.

From the 10,000 foot view, big data applications share many characteristics with Enterprise Integration and Batch applications.  Spring has provided proven solutions for building integration and batch applications for more than 6 years now via the Spring Integration and Spring Batch projects.  Spring XD builds upon this foundation and provides a lightweight runtime environment that is easily configured and assembled via a simple DSL.

In this blog we will introduce the key components of Spring XD, namely Streams, Jobs, Taps, Analytics and the DSL used to declare them, as well as the runtime architecture.  Many more details can be found in the XD Guide.

Streams

A Stream defines how data is collected, processed and stored or forwarded.  For example, a stream may collect syslog data, filter it, and store it in HDFS.  Spring XD provides a DSL to define a stream.  The DSL allows you to start simple using a UNIX pipes-and-filters syntax to build a linear processing flow but lets you also describe more complex flows using an extended syntax.

Sources and Sinks

A simple linear stream consists of the sequence: Input Source, (optional) Processing Steps, and an Output Sink.  As a simple example consider the collection of data from a HTTP Source writing to a File Sink. The DSL to describe this stream is
http | file

You tell Spring XD to create a stream by making a HTTP request to the XD Admin Server which runs on port 8080 by default.  In the M2 release we will provide an interactive shell to communicate with XD, but for M1 the easiest way is to interact with XD is using ‘curl’.

curl -d "http | file" http://localhost:8080/streams/httptest

The name of the stream is httptest, the default HTTP port to listen on is 9000, and the default file location is /tmp/xd/output/${streamname}.

If you post some data on port 9000 with curl
curl -d "hello world" http://localhost:9000

You will see the string hello world inside the file /tmp/xd/output/httptest

To change the default values, you can pass in option arguments

http --port=9090 | file --dir=/var/streams --name=data.txt

The supported sources in M1 are file, time, HTTP, Tail, Twitter Search, Gemfire (Continuous Queries), Gemfire (Cache Event), Syslog and TCP.  The supported sinks are Log, File, HDFS…

Get ahead

VMware offers training and certification to turbo-charge your progress.

Learn more

Get support

Tanzu Spring offers support and binaries for OpenJDK™, Spring, and Apache Tomcat® in one simple subscription.

Learn more

Upcoming events

Check out all the upcoming events in the Spring community.

View all