Dave Syer

Founder of Spring Cloud, Spring Boot, Spring Batch, lead of Spring Security OAuth, and an active contributor to Spring Integration, Spring Framework, Spring AMQP, Spring Security. Experienced, delivery-focused architect and development manager. Has designed and built successful enterprise software solutions using Spring, and implemented them in major institutions worldwide.

Recent Blog posts by Dave Syer

Cross Site Request Forgery and OAuth2

Engineering | November 30, 2011 | ...

In this short article we look at Cross Site Request Forgery in the context of OAuth2, looking at possible attacks and how they can be countered when OAuth2 is being used to protect web resources.

OAuth2 is a protocol enabling a Client application, often a web application, to act on behalf of a User, but with the User’s permission. The actions a Client is allowed to perform are carried out on a Resource Server (another web application or web service), and the User approves the actions by telling an Authorization Server that he trusts the Client to do what it is asking. Common examples of Authorization Servers on the internet are Facebook and Google, both of which also provide Resource Servers (the Graph API in the case…

Social Coding: Pull Requests - What to Do When Things Get Complicated

Engineering | July 18, 2011 | ...

Scenario: you want to contribute some code to an open source project hosted on a public git repository service like github. Lots of people make pull requests to projects I'm involved in and many times they are more complicated to merge than they need to be, which slows down the process a bit. The basic workflow is conceptually simple:

fork a public open source project
make some changes to it locally and push them up to your own remote fork
ask the project lead to merge your changes with the main codebase

and there is an excellent account of this basic workflow in a blog by Keith Donald.

Complications arise when the main codebase changes in between the time you fork it and the time you send the pull request, or (worse) you want to send multiple pull requests for different features or bugfixes, and need to keep them separate so the project owner can deal with them individually. This tutorial aims to help you navigate the complications using git.

The descriptions here use github domain language ("pull request", "fork", "merge" etc.), but the same principles apply to other public git services. We assume for the purposes of this tutorial that the public project is accepting pull requests on its master branch. Most Spring projects work that way, but some other public projects don't. You can substitute the word "master" below with the correct branch name and the same examples should all be roughly correct.

To help you follow what's going on locally, the shell commands below beginning with "$" can be extracted into a script and run in the order they appear. The endpoint should be a local repository in a directory called "work" that has an origin linked to its master branch (simulating the remote public project) and two branches on a private fork. The two branches have the same contents at their heads, but different commit histories (as per the ASCII diagram at the bottom).

The Two Remote Repositories

If you are going to send a pull request, there are two remote repositories in the mix: the main public project, and the fork where you push your changes.

It's a matter of taste to some extent, but what I like to do is make the main project the remote "origin" of my working copy, and use my fork as a second remote called "fork". This makes it easy to keep track of what's happening in the main project because all I have to do is

# git fetch origin

and all the changes are available locally. It also means that I never get confused when I do my natural git workflow

# git checkout master
# git pull --rebase
... build, test, install etc ...

which always brings me up to date with the main project. I can keep my fork in sync with the main project simply by doing this after a pull from master:

# git push fork

Initial Set Up

Let's create a simple "remote" repo to work with in a sandbox. Instead of using a git service provider we'll just do it locally in your filesystem (using UN*X commands as an example).

$ rm -rf repo fork work
$ git init repo
$ (cd repo; echo foo > foo; git add .; git commit -m "initial"; git checkout `git rev-parse HEAD`)

(The last checkout there was to leave the repository in a detached head state, so we can later push to it from a clone.) From now on, pretend "repo" is a public github project (e.g. git://github.com/SpringSource/repo.git).

The "fork" URL in this clone command would be something like [email protected]/myuserid/repo.git. Now we'll create the fork. This is equivalent to what github does when you ask it to fork a repository:

$ git clone repo fork
$ (cd fork; git checkout `git rev-parse HEAD`)

Finally we need to set up a working directory where we make our changes (remember "repo" = git://github.com/SpringSource/repo.git):

$ git clone repo work
$ cd work
$ git checkout origin/master

Because we cloned the main public repo that is by default the remote "origin". We are going to add a new remote so we can push our changes:

$ git remote add fork ../fork
$ git fetch fork
$ git push fork

The local repository now has a single commit and looks something like this in gitk (or your favourite git visiualization tool):

A (origin/master, fork/master, master)

In this diagram, "A" is the commit label, and in brackets we list the branches associated with the commit.

Get the Latest Stuff

You can always get the latest stuff from the main repo using

# git checkout master
# git pull --rebase

and sync it with the fork

# git push fork

If you operate this way, keeping master synchronized between the main repo and your fork as much as possible, and never making any local changes to the master branch, you will never have any confusion about where the rest of the world is. Also, if you are going to send multiple pull requests to the same public project, they will not overlap each other if you keep them separate on their own branches (i.e. not on master).

The Pull Request

When you want to start work on a pull request, start from a master branch fully up to date as above, and make a new local branch

$ git checkout -b mynewstuff

Make changes, test etc:

$ echo bar > bar
$ echo myfoo > foo
$ git add .
$ git commit -m "Added bar, edited foo"

and push it up to your fork repository with the new branch name (not master)

$ git push fork mynewstuff

If nothing has changed in the origin, you can send a pull request from there.

What if the Origin Changes?

For the purpose of this tutorial we simulate a change in the origin like this:

$ cd ../repo
$ git checkout master
$ echo spam > spam; git add .; git commit -m "add spam"
$ git checkout `git rev-parse HEAD`
$ cd ../work

Now we're ready to react to the change. First we'll bring our local master up to date

$ git checkout master
$ git pull
$ git push fork

The local repository now looks like this:

A -- B (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master)

Notice how your new stuff does not have origin/master as a direct ancestor (it's on another branch). This makes it awkward for the project owner to merge your changes. You can make it easier by doing some of the work yourself locally, and pushing it up to your fork before sending the pull request.

Re-writing History on your Branch

If you aren't collaborating with anyone on your branch it should be absolutely fine to rebase onto the latest changes from the remote repo and force a push:

# git checkout mynewstuff
# git rebase master

The rebase might fail if you have made changes that are incompatible with somehting that happened in the remote repo. You will want to fix the conflicts and commit them before moving on. This makes life difficult for you, but easy for the remote project owner because the pull requiest is guaranteed to merge successfully.

While you are re-writing history, maybe you want to squash some commits together to make the patch easier to read, e.g.

# git rebase -i HEAD~2
...

In any case (even if the rebase went smoothly) if you have already pushed to your fork you will need to force the next push because it has re-written history (assuming the remote repo has changed).

# git push --force fork mynewstuff

The local repository now looks like this (the B commit isn't actually identical to the previous version, but the difference isn't important here):

A -- D (master, fork/master, origin/master) -- B (mynewstuff, fork/mynewstuff)

Your new branch has a direct ancestor which is origin/master so everyone is happy. Then you are ready to go into github UI and send a pull request for your branch against repo:master.

What if I want to Keep my Local Commits?

If you committed your changes locally in multiple steps, maybe you want to keep all you little bitty commits, and still present your pull request as a single commit to the remore repo. That's OK, you can create a new branch for that and send the pull request from there. This is also a good thing to do if you are collaborating with someone on your feature branch and don't want to force the push.

First we'll push the new stuff to the fork repo so that our collaborators can see it (this is unnecessary if you want to keep the changes local):

$ git checkout mynewstuff
$ git push fork

then we'll create a new branch for the squashed pull request:

$ git checkout master
$ git checkout -b mypullrequest
$ git merge --squash mynewstuff
$ git commit -m "comment for pull request"
$ git push fork mypullrequest

Here's the local repository:

A -- B (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest)

You are good to go with this and your new branch has a direct ancestor which is origin/master so it will be trivial to merge.

If you weren't collaborating on the mynewstuff branch, you could even throw it away at this point. I often do that to keep my fork clean:

# git branch -D mynewstuff
# git push fork :mynewstuff

Here's the local repo, fully synchronized with both of its remotes:

A -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest)

Continue Working on your New Stuff

Let's say your pull request is rejected and the project owner wants you to make some changes, or the new stuff turns into something more interesting and you need to do some more work on it.

If you didn't delete it above, you can continue to work on your granular branch...

$ git checkout mynewstuff
$ echo yetmore > foo; git commit -am "yet more"
$ git push fork

and then move the changes over to the pull request branch when you are ready

$ git rebase --onto mypullrequest master mynewstuff

All the changes we want are in place now, but the branches are on the wrong commits. As you can see below, mynewstuff is where I want mypullrequest to be, and the remote fork/mynewstuff doesn't have a corresponding local branch:

A -- B -- C (fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (mypullrequest, fork/mypullrequest) -- F (mynewstuff)

We can use git reset to switch the two branches to where we want them (you can probably do this is a graphical UI if you like):

$ git checkout mypullrequest
$ git reset --hard mynewstuff
$ git checkout mynewstuff
$ git reset --hard fork/mynewstuff

and the new repository looks like this:

A -- B -- C (mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E (fork/mypullrequest) -- F (mypullrequest)

If we are OK with the pull request being 2 commits, we can just push it as it is:

$ git checkout mypullrequest
$ git push fork

The endpoint looks like this:

A -- B -- C(mynewstuff, fork/mynewstuff)
 \
  -- D (master, fork/master, origin/master) -- E -- F (mypullrequest, fork/mypullrequest)

Or we could rebase it to squash the commits together and force the push, scematically:

# git rebase -i HEAD~2
...
# git push --force fork

Because origin/master is a direct ancestor of fork/mypullrequest I know that my pull request will be trivial to merge.

Wrap Up

Hopefully this tutorial has given you enough git ammunition to go ahead and make some changes to your favourite open source project and be confident that the merge will be easy. Remember there is always more than one way to do it, and git is a powerful, low-level tool, so your mileage my vary and you might find variants of the approach above preferable or even necessary, depending on your changes.

Git and Social Coding: How to Merge Without Fear

Engineering | December 21, 2010 | ...

Git is great for social coding and community contributions to open source projects: contributors can try out the code easily, and there can be hordes of people all forking and experimenting with it but without endangering existing users. This article presents some examples with the Git command line that might help build your confidence with this process: how to fetch, pull and merge, and how to back out of mistakes. If you are interested in the social coding process itself, and how to contribute to Spring projects, check out another blog on this site by Keith Donald.

Grails has been on Github for a while and had a great experience with community contributions, so some other projects from SpringSource are starting to migrate over there as well. Some of the migrating projects are new (e.g. Spring AMQP) and some are already established and have migrated from SVN (e.g. Spring Batch). There are also some Spring projects on a SpringSource hosted Gitorious instance, for example Spring Integration…

Uploading Job Configurations to Spring Batch Admin

Engineering | April 08, 2010 | ...

An interesting problem that has no universal good solution is: how do I change the configuration of a running Spring application? Spring Batch Admin 1.0.0.M3 was released recently, and it has a configuration upload feature that solves this problem in a particular way. Someone asked for this feature at the recent S2GForum in Munich (if you missed that sign up for events in London and Amsterdam in May), and I was happy to tell him that it already existed, so maybe it deserves a bit more air time...

Screenshots of the Basic Use Case

We start with a look at the Jobs view in the application. It shows a list of jobs that are either launchable or monitorable by the web application.

Now the plan is to upload a new Job configuration and see this view change. So we start with…

Practical Use of Spring Batch and Spring Integration

Engineering | February 15, 2010 | ...

There are some common concerns of users of Spring Batch and Spring Integration, and we get asked a lot about how they fit together. Spring Batch Admin 1.0.0.M2 was released recently, and it makes heavy use of Spring Integration, so it is a good vehicle for looking at some specific use cases, and that is what we plan to do in this article.

Spring Batch Integration

Part of the 1.0.0.M2 release was the Spring Batch Integration module, recently migrated from Spring Batch and given a new home with Batch Admin. Many of the Batch-Integration cross over use cases are either implemented or demonstrated in Spring Batch…

Logging Dependencies in Spring

Engineering | December 04, 2009 | ...

This article deals with the choices that Spring makes and the options that developers have for logging in applications built with Spring. This is timed to coincide with the imminent release of Spring 3.0 not because we have changed anything much (although we are being more careful with dependency metadata now), but so that you can make an informed decision about how to implement and configure logging in your application. First we look briefly at what the mandatory dependencies are in Spring, and then go on to discuss in more detail how to set your application up to use some examples of…

Introducing Spring Batch Admin

Engineering | November 10, 2009 | ...

Spring Batch Admin provides a web-based user interface that features an admin console for Spring Batch applications and systems. It is a new open-source project from SpringSource. A milestone release 1.0.0.M1 will be available soon with all the features below, and we hope to get to a 1.0.0 final release early in 2010.

Main Use Cases

The easiest way to get a quick overview of Spring Batch Admin is to see some screenshots of the main use cases. The user interface is a web application (built with Spring MVC).

Inspect Jobs

The user can inspect the jobs that are known to the system. Jobs are either launchable or non-launchable (in the screenshot they are all launchable). The distinction is that a launchable job is defined and configured in the application itself, whereas a non-launchable job is detected as state left by the execution of a job in another process. (Spring Batch uses a relational database to track the state of jobs and steps, so historic executions can be queried to show the non-launchable jobs.)

Launch Job

Launchable jobs can be launched from the user interface with job parameters provided as name value pairs, or by an incrementer configured into the application.

Inspect Executions

Once a job is executing, or has executed, this view can be used to see the most recent executions, and a brief summary of their status (STARTED, COMPLETED, FAILED, etc.). Job Execution View

Each individual execution has a more detailed view (shown above), and from there the user can click down to a view of each of the step executions in the job (only one in this case). A common reason for wanting to do this is to see the cause of a failure. Step Execution (Top) View

The top of the step execution detail view shows the history of the execution of this step across all job executions. This is useful for getting a statistical feel for performance characteristics. A developer running a job in an integration test environment might use the statistics here to compare different parameterisations of a job, to see what effect is of changing (for instance) the commit interval in an item processing step. Step Execution (Bottom) View

The bottom of the step execution view has the detailed meta-data for the step (status, read count, write count, commit count, etc.) as well as an extract of the stack trace from any exception that caused a failure of the step (as in the example shown above).

Stop an Execution

A job that is executing can be stopped by the user (whether or not it is launchable). The stop signal is sent via the database and once detected by Spring Batch in whatever process is running the job, the job is stopped (status moves from STOPPING to STOPPED) and no further processing takes place.

Where to get it

The best place to start is the SpringSource community download page. There is also a snapshot download attached to this article, or you can get the source code from subversion and compile it yourself. Snapshot builds also go up to S3 to the Maven repository every night:

<repository>
	<id>spring-snapshots</id>
	<name>Spring Maven Snapshot Repository</name>
	<url>http://s3.amazonaws.com/maven.springframework.org/snapshot</url>
</repository>

There are two JAR artifacts and a WAR sample (org.springframework.batch:spring-batch-admin…

Using Yourkit to Find a Memory Leak

Engineering | July 05, 2009 | ...

I had such a great experience today with Yourkit that I thought I'd write a quick plug. It's been a couple of years since I used it in anger, and even then it was the best tool I could find, but now it really is ultra slick. I haven't done an exhaustive survey of the marketplace, and that wasn't the object of the exercise: I just wanted a tool to solve a problem.

Here's the story of my day; frustration, then irritation, then finally satisfaction. I had a suspected memory leak in Spring Batch and I needed to track it down quickly. The back story to this is I've seen plenty of memory leaks, but I haven't had to deal with one at the coal face for quite some time. I live in STS these days (sometimes dream in it as well), so I needed a tool that worked well in the IDE. I tried two tools, but only because the first choice didn't work. The two I tried were TPTP and Yourkit…

Spring Batch 2.0 New Feature Rundown

Engineering | October 21, 2008 | ...

In this article we outline the main themes of Spring Batch 2.0, and highlight the changes from 1.x. Development work on the new release is well underway, with an M2 release last week, and we are getting a lot of interest, so now seems like a good time to give a few pointers.

Spring Batch 2.0 Themes

The four main themes of the new release are

Java 5 and Spring 3.0
Non-sequential execution
Scalability
Configuration: annotations and XML namespace

so we'll cover each of those areas separately and describe what they mean and the impact of the changes on Spring Batch existing users. There is more detail below for features that are already implemented, which is mostly in the first category with some enabling features in other areas.

There are no changes to the physical layout of the project in Spring Batch 2.0.0.M2 (same old downloads, same basic layout of Java packages). We have not removed any features, but we have taken the opportunity to revise a couple of APIs, and there are some minor changes for…

Spring Batch 2.0.0.M2 Released

Releases | October 15, 2008 | ...

Spring Batch 2.0.0.M2 is now available. See the Spring Batch downloads page for more information - there is the usual .zip download and also Maven artifacts in S3.

Most work in this release went into the chunk-oriented approach to processing, which means changes to the ItemReader and ItemWriter interfaces, plus the introduction of the ItemProcessor as a top-level concern for translating between input and output items. Chunk-oriented processing is a key enabler for performance and scalability, as well as being much clearer for users in the extension points and interfaces (no more framework…