On behalf of the team, I am pleased to announce the general availability of Spring Cloud Data Flow 1.3 across a range of platforms
A streaming data pipeline orchestrated as a series of microservice applications has always been the core value of Spring Cloud Data Flow’s design. In Data Flow 1.3 we have provided the ability to update sources, processors, and sinks independently without having to undeploy and redeploy the entire stream.
The stream update and rollback functionality is implemented by delegating the deployment process to a new Spring Cloud project called Skipper. Skipper is a lightweight Spring Boot application, purpose-built to fill this feature gap in Data Flow. Skipper defines a package format, much like
brew and can also deploy/undeploy applications to multiple cloud platforms: Local, Cloud Foundry, and Kubernetes. It uses the same Spring Cloud Deployer libraries that have been part of Data Flow since the beginning. Recent presentations at SpringOne 2017 introduces Skipper and the integration with Data Flow in more depth.
When deploying a Stream, Data Flow creates Skipper package describing the Stream and the applications that are part of the Stream definition. Skipper then deploys the applications to the desired platform. When requesting a stream update, only the application or applications that need to be changed are automatically redeployed. A simple strategy managed by a Spring Statemachine instance performs the update or rollback steps.
Data Flow includes new stream commands to make upgrade and rollback operations.
dataflow:>app register --name transform --type processor --uri maven://com.eg:transformer:0.0.1 dataflow:>stream create mystream --definition "jdbc | transform | mongodb" dataflow:>app register --name transform --type processor --uri maven://com.eg:transformer:0.0.2 dataflow:>stream update mystream --properties “version.transform=0.0.2” dataflow:>stream rollback mystream
In this series of commands, the stream is deployed using version 0.0.1 of the transformer. The
mongodb source and sink are already registered. The stream is then updated to use version 0.0.2 of the transformer. Only the
transform application is updated, with version 0.0.2 being deployed and version 0.0.1 being undeployed. The
mongodb applications are left as-is. The rollback command does the opposite, bringing the stream back to the state with version 0.0.1 of the transformer.
Note: To use Data Flow and Skipper, Data Flow’s feature toggle for Skipper must be enabled in both the Data Flow Server and shell.
The DataFlowTemplate class has been the workhorse of deploying streams and tasks programmatically. However, it is a fairly low level API. We have added a new fluent style API to create, deploy, or launch streams that is easier to use and also and enables the reuse of StreamApplication instances across multiple streams.
StreamApplication source = new StreamApplication("http") .addProperty("server.port", 9900); StreamApplication processor = new StreamApplication("filter") .addProperty("expression", "payload=='good'"); StreamApplication sink = new StreamApplication("log"); Stream simpleStream = streamBuilder.name("simpleStream") .source(source) .processor(processor) .sink(sink) .create() .deploy();
With a Stream instance you can ask for the stream’s status, undeploy, or destroy the stream.
The Data Flow Dashboards has been updated to take advantage of Angular 4 and align with the Pivotal UI styling. A major focus is the consistent use of domain model classes vs. straight JSON strings. This allows for finer-grained control over the state of the application, e.g. when transitioning from a paginated list to a details page and back. There is also improved documentation for the UI using Compodoc
Fan-in refers to when multiple sources all send data to the same messaging destination. Fan-out refers to determining the messaging destination at runtime. This video shows the UI in action for streams with these topologies.
Users who register applications as maven artifacts can now take advantage of the “update-policy” feature to override and refresh Spring Cloud Data Flow’s internal maven cache. For instance, in development, you can continuously resolve SNAPSHOT versions of the maven artifact by setting
update-policy=always, which will force the download of the latest version of the streaming or batch/task application that’s in use in the DSL/Dashboard.
Based on user feedback, applications registered using an
http resource will always be downloaded and not cached. This facilitates the development lifecycle of updating the code, but not the name, of an application’s uberjar hosted on a web server.
When in Skipper mode, multiple application versions can be registered. A default version is used when deploying the stream. You can set the default version using the new command
app default. However, when upgrading an application version in a stream, you must first register it in Data Flow.
This release adds “autocompletion” for stream and task/batch names and other metadata. No more guessing - everything is a TAB press away! Check out the following screencast to learn more about the advanced shell features, tips, and tricks.
Initial support for running functions in SCDF is provided by the use of a function-runner application. When creating a stream with Spring Cloud Function application, you pass in the function’s classname and jar location.
dataflow:> stream create foo --definition "http | function-runner --function.className=com.example.functions.CharCounter --function.location=file:///home/john/myfunction.jar | log"
There is a sample you can use to experiment with this feature. Simplifying the deployment of Spring Cloud Functions by not requiring explicit user invocation of the
function-runner is on our roadmap.
Improving upon the data science capabilities, Python-HTTP and Python-Jython processors are now also available.
Simplifying the deployment of Spring Cloud Functions by not requiring explicit user invocation of the
function-runner is on our roadmap.
Spring Cloud Data Flow’s Cloud Foundry tile has been in a closed-BETA state for the last few months. We have iterated on customer and field feedback and it is set to graduate out of BETA to a 1.0 GA status officially. This release automates the provisioning (including the metrics-collector, skipper, database, and message-broker) along with end-to-end OAuth/SSO integration in Cloud Foundry. There are a lot of other value-adds, so stay tuned for a more focused discussion, documentation, and pointers to the tile-page in Pivotal Network.
Spring Cloud Data Flow’s helm-chart will be updated to the latest 1.3 GA release once the pull-request is merged. With this chart, the latest release of SCDF along with the companion components (metrics-collector, skipper, database, and message-broker) can be automatically provisioned with the following helm-commands.
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator helm repo update helm install --name scdf incubator/spring-cloud-data-flow --set rbac.create=tru
Please try it out, share your feedback, and consider contributing to the project!