Spring Team
Mark Pollack

Mark Pollack

Spring Cloud Data Flow co-lead

New York, NY

Mark Pollack is a software engineer with Pivotal and is the co-lead of the Spring Cloud Data Flow and Spring XD projects He has been a contributor to many Spring projects dating back to the Spring Framework in 2003 as well as founding the Spring.NET and Spring Data projects
Blog Posts by Mark Pollack

Spring Cloud Data Flow for Cloud Foundry 1.3.0.M3 released

We are pleased to announce the 1.3.0.M3 release of the Spring Cloud Data Flow for Cloud Foundry.

The Getting Started Guide is the best place to start kicking the tires.

Release Highlights

Stream updates, a JavaDSL, and the complete port of the UI to the Angular 4.0 stack are some of the main highlights. More information on release highlights can be found in the release blog for the core Data Flow project.

Of note for the Cloud Foundry server is an upgrade to v2.23.0 of the cf-java client library and setting the default health check to be http instead of port. You can now also specify the health check endpoint URL and timeout values as deployment properties.

Read more...

Spring Cloud Data Flow 1.3.0.M3 released

We are pleased to announce the 1.3.0.M3 release of the Spring Cloud Data Flow and its associated ecosystem of projects.

Local Server: Getting Started Guide

Release Highlights

Stream updates and rollback

A streaming data pipeline orchestrated as a series of microservice applications has always been the core value of Spring Cloud Data Flow’s design. In 1.3.0.M3 we have provided the ability to update sources, processors, and sinks independently without having to undeploy and redeploy the entire stream.

The stream update feature is implemented by delegating the deployment process to a new Spring Cloud project called Skipper. Introduced in this blog, Spring Cloud Skipper is a standalone server that deploys Spring Boot applications to multiple cloud platforms. It also keeps track of the application version, application properties, and deployment properties of the deployed application or applications so that the changes to any of these properties can be calculated upon an update request.

When a request is made to update applications in a Stream, only the application or applications that need to be changed are updated. A simple red/black update is performed and the design relies on Spring Boot’s health endpoint associated with the new application(s). By keeping track of the deployments, Skipper can also rollback to previous versions of the stream.

The following new Data Flow commands show the basic lifecycle of updating the log sink from version 1.1.0 to version 1.2.0 and then rolling back to 1.1.0. This assumes that both version 1.1.0 and 1.2.0 of the log sink were deployed as maven artifacts. Note that the HTTP source application remains deployed throughout the changes to the log sink.

dataflow:> app register --name http --type source --uri maven://org.springframework.cloud.stream.app:http-source-rabbit:1.2.0.RELEASE

dataflow:> app register --name log --type sink --uri maven://org.springframework.cloud.stream.app:log-sink-rabbit:1.1.0.RELEASE

dataflow:> stream create --name httptest --definition "http --server.port=9000 | log"

dataflow:> stream skipper deploy --name httptest

dataflow:> stream skipper update --name httptest --properties version.log=1.2.0.RELEASE

dataflow:> stream skipper rollback --name httptest

The Getting Started and Streams with Skipper sections of the documentation walk through the process in much greater detail.

Read more...

Spring Cloud Skipper 1.0 M2 Released

On behalf of the team, I am pleased to announce the release of Spring Cloud Skipper 1.0 M2.

Skipper is a lightweight tool that allows you to discover Spring Boot applications and manage their lifecycle on multiple Cloud Platforms. You can use Skipper standalone or integrate it with Continuous Integration pipelines to help implement the practice of Continuous Deployment.

The 1.0 M2 release fixes several bugs and introduces a few new features.

  • Support for Postgres, MySQL, Microsoft SQL Server, and HSQLDB databases.
  • Improved support for upgrading applications that use an HTTP location for the resource definition.
  • LRU cache used to manage disk space for HTTP and Maven based resources that are downloaded.
  • HTTP based resources are always downloaded, never cached.
  • Use updated CF Deployer library with an HTTP based health check.
Read more...

Spring Cloud Skipper 1.0 M1 Released

On behalf of the team, I am pleased to announce the release of Spring Cloud Skipper 1.0 M1.

Skipper is a lightweight tool that allows you to discover Spring Boot applications and manage their lifecycle on multiple Cloud Platforms. You can use Skipper standalone or integrate it with Continuous Integration pipelines to help implement the practice of Continuous Deployment.

The main features in Skipper 1.0 M1 are:

  • Define multiple platform accounts where Spring Boot applications can be deployed. Supported platforms are Local, Cloud Foundry, and Kubernetes.
  • Substitute variables in Mustache templated files that describe how to deploy applications to a platform.
  • Search Package Repositories for existing applications.
  • Upgrade/Rollback a package based on a simple blue/green workflow.
  • Store the history of resolved template files (aka ‘application manifests’) which represent the final description of what has been deployed to a platform for a specific release.
  • Use via a standalone interactive shell or web API.
Read more...

Spring Cloud Data Flow 1.2 GA released

On behalf of the team, I am pleased to announce the general availability of Spring Cloud Data Flow 1.2 across a range of platforms

Here are the relevant links to documentation and getting started guides.

Highlights of the 1.2 release:

Composed Tasks

This release introduces Composed Tasks ! This feature provides the ability to orchestrate a flow of tasks as a cohesive unit-of-work. A complex ETL pipeline may include executions in sequence, parallel, conditional transitions, or a combination of all of the above. The composed task feature comes with DSL primitives and an interactive graphical interface to quickly build these type of topologies more easily. You can read more about it from the reference guide.

An ETL job, for example, may include multiple steps. Each step in the topology can be built as a finite short-lived Spring Cloud Task application. The orchestration of multiple tasks as steps can be easily defined with the help of the Data Flow Task DSL.

task create simple-etl --definition "extractDbToHDFS &&
      <analysisInSpark || enrichAndLoadHawq> &&
      <populateMgmtDashboard || runRegulatoryReport || loadAnalyticsStore>"

This will first run extractDbToHDFS and then run analysisInSpark and enrichAndLoadHawq in parallel, waiting for the both of them to complete before running the three remaining tasks in parallel and waiting for them to all complete before ending the job. The graphical representation of this topology looks like the following.

Visualization of Composed Tasks

Real-time Metrics and Monitoring

Real-time metrics are now part of the operational view of deployed streams. The applications that are part of a stream publish metrics contained in their Spring Boot /metrics actuator endpoint. This includes send and receive messages rates. A new server, the Spring Cloud Data Flow Metrics Collector, collects these metrics and calculates aggregate message rates. The Data Flow server queries the Metrics Collector to support showing message rates in the UI and in the shell. For more details about the architecture, refer to the Monitoring Deployed Applications Section in the reference guide.

The screenshot below shows the aggregate message rates for a time | log stream with three instances of the time and log applications. Each dot below the main application box shows the message rates for each individual application along with a guid value that can be used to identify the application on the platform where they are running.

Visualization of Input and Output Rates in Flo

The Runtime tab, shown below, also had improvements to show message rates and any other metrics exposed by the platform. For the script savvy users, the shell experience also includes these details via the runtime apps command.

Runtime Apps UI

Companion Artifact

The companion artifact support introduced in 1.2 M3 has had some improvements. The bulk registration workflow now eagerly resolves and downloads the metadata artifacts for all the out-of-the-box applications. This comes handy in the Shell or UI when reviewing the supported properties for each application.

OAUTH Improvements

This change will provide an additional option for REST-API users. Instead of providing a username:password combination via BasicAuth, users will now have the ability to retrieve an OAuth2 Access token from their OAuth2 provider directly and then provide the Access Token in the HTTP header, when invoking RESTful calls against a secured Spring Cloud Data Flow setup.

Role based access

Add role-based access control to define who has access to create, deploy, destroy, or view streams/tasks. This works seamlessly in coordination with the supported authentication methods.

Bug reporting

A new REST endpoint and About page in the Dashboard to collect server implementation details to the clipboard for use in bug reporting.

Spring Cloud Stream App Starters - Bacon.RELEASE

The Stream App Starters Bacon.RELEASE is now generally available which provides you a range of sources, processors, and sinks to get started creating stream. All the out-of-the-box stream applications build upon Spring Cloud Stream Chelsea.RELEASE and Spring Cloud Dalston.RELEASE foundation. There were several enhancements and bug-fixes to the existing applications and this release-train also brings new applications such as MongoDB-sink, Aggregator-processor, Header-Enricher-processor, and PGCopy-sink.

For convenience, we have generated the bit.ly links that includes the latest coordinates for docker and maven artifacts.

Spring Cloud Task App Starters - Belmont.RELEASE

The Task App Starters Belmont.RELEASE release is now complete. To support Composed Task feature in Spring Cloud Data Flow, we have added a new out-of-the-box application named Composed Task Runner. This is a task that executes others tasks in a directed graph as specified by a DSL that is passed in via the --graph command line argument.

The Belmont.RELEASE builds upon Spring Cloud Task 1.2 RELEASE and Spring Cloud Dalston.RELEASE foundation.

For convenience, we have generated the bit.ly links that includes the latest coordinates for docker and maven artifacts.

What’s Next?

An immediate goal is adding more automated integration tests and to expose this as an additional user facing feature. You can track that work here.

Beyond the 1.2.x line, we are going to start planning for the 2.0 version. Some general themes are support for deploying individual applications and keeping track of application deployment properties and metadata such as the application version. This functionality would build up into supporting a rich Continuous Delivery theme at the application level that also extends to "editing" streams at runtime. In addition, we are also looking into supporting functions, either "in-line" as Java code or compiled java.util.Function s to be a first class programming model for data processing a stream.

Feedback is important. Please reach out to us in StackOverflow and GitHub for questions and feature requests. We also welcome contributions! Any help improving the Spring Cloud Data Flow ecosystem is appreciated.

Read more...

Spring Cloud Data Flow 1.2 RC1 released

On behalf of the team, I am pleased to announce the first release candidate of Spring Cloud Data Flow 1.2.

Note: A great way to start using this new release is to follow the Getting Started Guide in the reference documentation.

Highlights of the 1.2 RC1 release:

Composed Tasks

This release introduces Composed Tasks ! This feature provides the ability to orchestrate a flow of tasks as a cohesive unit-of-work. A complex ETL pipeline may include executions in sequence, parallel, conditional transitions, or a combination of all of the above. The composed task feature comes with DSL primitives and an interactive graphical interface to quickly build these type of topologies more easily. You can read more about it from the reference guide.

An ETL job, for example, may include multiple steps. Each step in the topology can be built as a finite short-lived Spring Cloud Task application. The orchestration of multiple tasks as steps can be easily defined with the help of the Data Flow Task DSL.

task create simple-etl --definition "extractDbToHDFS &&
      <analysisInSpark || enrichAndLoadHawq> &&
      <populateMgmtDashboard || runRegulatoryReport || loadAnalyticsStore>"

This will first run extractDbToHDFS and then run analysisInSpark and enrichAndLoadHawq in parallel, waiting for the both of them to complete before running the three remaining tasks in parallel and waiting for them to all complete before ending the job. The graphical representation of this topology looks like the following.

Visualization of Composed Tasks

Real-time Metrics and Monitoring

Real-time metrics are now part of the operational view of deployed streams. The applications that are part of a stream publish metrics contained in their Spring Boot /metrics actuator endpoint. This includes send and receive messages rates. A new server, the Spring Cloud Data Flow Metrics Collector, collects these metrics and calculates aggregate message rates. The Data Flow server queries the Metrics Collector to support showing message rates in the UI and in the shell. For more details about the architecture, refer to the Monitoring Deployed Applications Section in the reference guide.

The screenshot below shows the aggregate message rates for a time | log stream with three instances of the time and log applications. Each dot below the main application box shows the message rates for each individual application along with a guid value that can be used to identify the application on the platform where they are running.

Visualization of Input and Output Rates in Flo

The Runtime tab, shown below, also had improvements to show message rates and any other metrics exposed by the platform. For the script savvy users, the shell experience also includes these details via the runtime apps command.

Runtime Apps UI

Companion Artifact

The companion artifact support introduced in 1.2 M3 has had some improvements. The bulk registration workflow now eagerly resolves and downloads the metadata artifacts for all the out-of-the-box applications. This comes handy in the Shell or UI when reviewing the supported properties for each application.

OAUTH Improvements

This change will provide an additional option for REST-API users. Instead of providing a username:password combination via BasicAuth, users will now have the ability to retrieve an OAuth2 Access token from their OAuth2 provider directly and then provide the Access Token in the HTTP header, when invoking RESTful calls against secured Spring Cloud Data Flow setup.

Spring Cloud Stream App Starters - Bacon.RELEASE

Bacon.RELEASE is now generally available. All the out-of-the-box stream applications build upon Spring Cloud Stream Chelsea.RELEASE and Spring Cloud Dalston.RELEASE foundation. There were several enhancements and bug-fixes to the existing applications and this release-train also brings new applications such as MongoDB-sink, Aggregator-processor, Header-Enricher-processor, and PGCopy-sink.

For convenience, we have generated the bit.ly links that includes the latest coordinates for docker and maven artifacts.

Spring Cloud Task App Starters - Belmont.RC1

The App Starters Belmont.RC1 release is now complete. To support Composed Task feature in Spring Cloud Data Flow, we have added a new out-of-the-box application named Composed Task Runner. This is a task that executes others tasks in a directed graph as specified by a DSL that is passed in via the --graph command line argument.

The Belmont.RC1 builds upon Spring Cloud Task 1.2 RC1 and Spring Cloud Dalston.RELEASE foundation.

For convenience, we have generated the bit.ly links that includes the latest coordinates for docker and maven artifacts.

What’s Next?

The 1.2.0.RELEASE is around the corner. We are aiming to wrap it over the next 2-3 weeks. Spring Cloud Data Flow’s runtime implementations will catch up and adapt to this foundation momentarily after the core release.

Feedback is important. Please reach out to us in StackOverflow and GitHub for questions and feature requests. We also welcome contributions! Any help improving the Spring Cloud Data Flow ecosystem is appreciated.

Read more...

Spring Cloud Data Flow 1.1 GA released

On behalf of the team, I am pleased to announce the GA release of Spring Cloud Data Flow 1.1. Follow the links in the getting started guide to download the local server implementation and shell to create Stream and Tasks.

General highlights of the 1.1 GA Release include:

Read more...

Spring Cloud Data Flow 1.1 RC1 Released

On behalf of the team, I am pleased to announce the first release candidate of Spring Cloud Data Flow 1.1. Follow the links in the getting started guide to download the local server implementation and shell to create Stream and Tasks.

The 1.1 RC1 release includes the following new features and improvements:

  • Builds upon Camden.SR2 release improvements

  • LDAP authentication is now supported with SSL

  • Portable deployment properties for memory and cpu are in place for support across various runtime implementations

  • Passing Java Options to the local JVM when launching application is now supported

  • UI Improvements

    • List pages now support sorting

    • Server-side search support for stream and task list pages

    • Content-assist for bulk task definitions including the support for incremental validations of task application properties

  • Add content assist support for tasks in the shell

  • Thanks to the community for adding DB2 support for the TaskRepository

  • Documentation on how to use Spring Boot Admin to visualize server metrics

Read more...

Spring Cloud Data Flow 1.1 M2 Released

On behalf of the team, I am pleased to announce the release of the second milestone of Spring Cloud Data Flow 1.1. You can download the local server that is part of this release here.

The 1.1 M2 release includes the following new features and improvements:

  • Builds upon Boot 1.4.1 and Spring Cloud Camden improvements

  • Task application properties can now be referenced using non-prefixed property names

  • Add visual representation for related streams. This representation also includes nested TAPs and the downstream processing nodes in an overall topology view.

Read more...

Spring Cloud Data Flow 1.0 GA released

On behalf of the team, I’m excited to announce the 1.0 GA release of Spring Cloud Data Flow!

Note
A great way to start using this new release is to follow the Getting Started section of the reference documentation. It uses a Data Flow server that runs on your computer and deploys a new process for each application.

Spring Cloud Data Flow (SCDF) is an orchestration service for data microservices on modern runtimes. SCDF lets you describe data pipelines that can either be composed of long lived streaming applications or short lived task applications and then deploys these to platform runtimes that you may already be using today, such as Cloud Foundry, Apache YARN, Apache Mesos, and Kubernetes. We provide a wide range of stream and task applications so you can get started right away to develop solutions for use-cases such as data ingestion, real-time analytics and data import/export.

Read more...