Tuesday, June 2, 2009

Clean up trailing whitespaces in sources

My editor (emacs) is configured to remove trailing whitespaces in python files when I save them. This way I never commit modifications related to whitespaces changes, making the diffs readable since they contain relevant modifications only.

Unfortunately not everyone do that, and when it comes to contributing to an existing project it can be very difficult to produce readable patches: sometimes while the actual patch is just a one line change the diff will show dozens of blank changes, due to whitespaces clean up.

Diff has a switch to ignore whitespace changes, but this is incompatible with python: if you change the block level (indentation) it is just ignored.

To clean up all python files found under a directory I use a shell one-liner:
$ find . -name '*.py' -exec sed -i {} -e 's/[ \t]*$//' ';'
As usual it worked-for-me™ but comes with no warranty.

My workflow for contributing a clean patch is like this:
  1. create a local branch with bazaar, mercurial or git. Of course it depends if the project is already using one of them, but if you branch from a subversion repository it's just a matter of preferences
  2. clean whitespaces and commit (locally)
  3. create and submit the patch normally
And here is the related configuration part for emacs:
;; whitespace cleanup
(defun my-py-no-trailing-space ()
; this hook is buffer local, can't add it globaly
(add-hook 'write-contents-functions 'delete-trailing-whitespace)
; if enabled, clear buffer at load time (this will automatically put the buffer in modified state, it might be annoying)
;(whitespace-cleanup)
)

(add-hook 'python-mode-hook 'my-py-no-trailing-space)
I believe that other editors and environments (including Eclipse) can be configured for this, too.

7 comments:

Paul Moore said...

Why not just leave whitespace alone throughout files, unless you are actually changing the lines in question?

That way, patches aren't mangled by whitespace changes, and you get to submit patches based on the upstream sources without any changes.

Bertrand Mathieu said...

@Paul: as I stated at the beginning of the post my editor is configured to always remove trailing whitespaces, and this is made intentionally. So when I click "save" (or C-x C-s) they are removed, and this is what I want for my code.

I do that because a commit should never contain those artifacts that turn to be noise (and it's so easy to add or remove trailing spaces while editing that you just cannot filter this manually).

Furthermore some projects or organizations enforce the "no trailing whitespace" policy in pre-commit hooks. In this case if you can't clean you cannot commit.

Michael Foord said...

Python has a 'no-trailing-whitespace' pre-commit hook and will reject files that have trailing whitespace.

At work we have a few tests that check how our editor (in Resolver One the IronPython spreadsheet system) handles trailing whitespace. If someone's editor strips those tests it will be a real problem!

Thijs said...

I gave it a try on a folder with .sh and .py files, and it didn't do anything except for printing this statement many times:

sed: -i may not be used with stdin

ST said...

Typing M-x delete-trailing-spaces was becoming second nature. I was about to look into how to add it as a hook when you posted this.

You spared me a lot of fiddling with e-lisp. Thank you.

Bertrand Mathieu said...

@Thijs: I just tried, it works for me. No error (of course don't include the "$" sign, it's supposed to be the shell prompt).
Maybe it depends on shell used (mine: bash) and "find" version? For you it looks like "{}" is not replaced by the file name. Or maybe your sed does not interpret "-i" as mine?
for the record my system is Ubuntu 9.04.

Reinout said...

Thijs: you're probably on OSX (me too). There's often a slightly different syntax for such commands as OSX descends more from BSD and doesn't use that many GNU tools. What works is:

find . -name '*.py' -exec sed -i '' -e 's/\ *$//' {} ';'

Reinout