Tuesday, June 2, 2009

Clean up trailing whitespaces in sources

My editor (emacs) is configured to remove trailing whitespaces in python files when I save them. This way I never commit modifications related to whitespaces changes, making the diffs readable since they contain relevant modifications only.

Unfortunately not everyone do that, and when it comes to contributing to an existing project it can be very difficult to produce readable patches: sometimes while the actual patch is just a one line change the diff will show dozens of blank changes, due to whitespaces clean up.

Diff has a switch to ignore whitespace changes, but this is incompatible with python: if you change the block level (indentation) it is just ignored.

To clean up all python files found under a directory I use a shell one-liner:
$ find . -name '*.py' -exec sed -i {} -e 's/[ \t]*$//' ';'
As usual it worked-for-me™ but comes with no warranty.

My workflow for contributing a clean patch is like this:
  1. create a local branch with bazaar, mercurial or git. Of course it depends if the project is already using one of them, but if you branch from a subversion repository it's just a matter of preferences
  2. clean whitespaces and commit (locally)
  3. create and submit the patch normally
And here is the related configuration part for emacs:
;; whitespace cleanup
(defun my-py-no-trailing-space ()
; this hook is buffer local, can't add it globaly
(add-hook 'write-contents-functions 'delete-trailing-whitespace)
; if enabled, clear buffer at load time (this will automatically put the buffer in modified state, it might be annoying)
;(whitespace-cleanup)
)

(add-hook 'python-mode-hook 'my-py-no-trailing-space)
I believe that other editors and environments (including Eclipse) can be configured for this, too.

Friday, May 29, 2009

Installing Django, Solr, Varnish and Supervisord with Buildout

Here I'll detail my buildout configuration for an install of Django, Solr (http search server, ), Varnish (http cache), and supervisord for controling solr and varnish. I'll show how to get a Debian init script for supervisord (of course instructions are valid for Ubuntu). I'll detail parts of base.cfg for each service, and I'll try to explain what and why.

This may not be the best way to do that but at least it works for me(tm), so I think it deserves to be shared.

My buildout doesn't handle (yet?!) the apache configuration, so I will not cover this. For the curious here is the intended http chain:
  • Apache listens on port 80 and forwards requests to Varnish on port 3128. Unlike a typical Zope setup I don't need rewrite rules (here), simple proxying is enough
  • Varnish reaches the backend (django) on port 8000
  • Apache listens on port 8000, and serves Django with wsgi. Hopefully it serves only localhost.
  • django may query solr on port 8983

Buildout files organisation

In my buildout directory I have:
  • base.cfg: this file contains the core configuration. Specific settings (for developpement, production...) are made in files that extends base.cfg.
  • templates/: this directory contains file templates used in my buildout, for example I put here the template of the supervisord init script
  • varnish-conf/, solr-conf/: I'm versionning configuration for theses services, since the configurations generated by the recipes needed adjustements
Here is the "buildout" part in base.cfg:
[buildout]
newest = false
newest: by default I don't want to check if eggs can be updated
versions = versions
I want to have an exact version for a given egg, It will be declared in "versions" section. For example one can set "my.app = 1.0.3" (If you are developping "my.app" you can unset this in dev.cfg by declaring "my.app = ")
parts =
svn-products
django
solr-files
solr
solr-conf
varnish-build
varnish
supervisor
supervisord_init_script
parts are installed in order. I'll detail them in time.
find-links =
http://dist.repoze.org/
A distribution of PIL can be found here (it is poorly referenced at pypi). Also if you cannot upload "my.app" at pypi (customer project anyone?) and you don't have an egg server you can put egg tarballs of "my.app" at a local web server and put the link in "find-links".
eggs =
PIL
lxml
psycopg2
django-extensions
django-cachepurge
Werkzeug
my.app

[versions]
djangorecipe = 0.17.4
django-extensions = 0.4
Werkzeug = 0.5

Django

Parts related to django are "svn-products" and "django". "svn-products" allows me to get solango (not yet egg released :-( ).
[svn-products]
recipe = iw.recipe.subversion
urls =
http://django-solr-search.googlecode.com/svn/trunk/solango solango
The django part. Note that I'm using django 1.0.2, since 1.1 is not yet released final. This is a matter of choice. Django 1.0.2 is available as an egg, but yet the recipe doesn't use this and downloads itself django.
[django]
recipe = djangorecipe
version = 1.0.2
control-script = django
wsgi = true
projectegg = my.app
eggs = ${buildout:eggs}
extra-paths = ${svn-products:location}

Solr

solr installation is made of 4 parts:
  • solr-files: download and unpack solr distribution:
    [solr-files]
    recipe = hexagonit.recipe.download
    url = ftp://mir1.ovh.net/ftp.apache.org/dist/lucene/solr/1.3.0/apache-solr-1.3.0.tgz
    md5sum = 23774b077598c6440d69016fed5cc810
    strip-top-level-dir = true
  • solr: creates a runable instance of solr
    [solr]
    recipe = collective.recipe.solrinstance
    solr-location = ${buildout:parts-directory}/solr-files
    host = localhost
    port = 8983

    unique-key = uniqueID
    default-search-field = text

    index =
    name:uniqueID type:string indexed:true stored:true required:true
    name:text type:string indexed:true stored:true required:false omitnorms:false multivalued:true
  • solr-conf: I have added this to overwrite some config files in solr instance directory
    [solr-conf]
    recipe = iw.recipe.cmd
    on_install = true
    on_update = true
    cmds =
    cp -v ${buildout:directory}/solr-conf/jetty.xml ${solr:jetty-destination}
    cp -v ${buildout:directory}/solr-conf/schema.xml ${solr:schema-destination}
    cp -v ${buildout:directory}/solr-conf/stopwords_fr.txt ${solr:schema-destination}
    Why? because:
    • for jetty.xml I made solr listen only on localhost, this was not by default. If you choose to customize jetty.xml you must change absolute paths by relative ones. For example for "RequestLog", the path must be changed to: "../../var/solr/log/jetty-yyyy_mm_dd.request.log"
    • For schema.xml it is a bit different. The first times I have let the recipe generate it, but solango offers to output fields definitions from you application. Thus there is no reason to maintain them in buildout (in "solr" part). The command is:
    bin/django solr --fields --path=/tmp
    Then update schema.xml with the output.

  • solr-rebuild: "command" for reindexing django content (clear & rebuild)
    [solr-rebuild]
    recipe = iw.recipe.cmd
    on_install = true
    on_update = true

    # since solr is not started by solr-instance but supervisord, solr-instance has
    # no pid file and thinks that solr is down. Thus we must run it with
    # solr-instance to be able to "solr-instance purge"
    cmds =
    ${buildout:bin-directory}/supervisorctl stop solr
    cp -v ${buildout:directory}/solr-conf/schema.xml ${solr:schema-destination}
    ${buildout:bin-directory}/solr-instance start
    COUNT=15; echo "Waiting $COUNT s"; sleep $COUNT
    ${buildout:bin-directory}/solr-instance purge
    time ${buildout:bin-directory}/${django:control-script} solr --reindex --batch-size 100
    ${buildout:bin-directory}/solr-instance stop
    ${buildout:bin-directory}/supervisorctl start solr
    Actually I could have made a template of a shell script with collective.recipe.template, and I'll probably change for that solution; I made this quickly and I didn't know yet about the possibilities of the template recipe. Right now to rebuild solr-index I have to type:
    $ bin/buildout install solr-rebuild
    Note that solr-rebuild part is not listed in buildout:parts, because I don't want to run it by default.

Varnish

Nothing really advanced here. I have just customized varnish configuration to change a few things, and to add a ping url (important for supervisord).
[varnish-build]
recipe = zc.recipe.cmmi
url = http://downloads.sourceforge.net/varnish/varnish-2.0.4.tar.gz

[varnish]
recipe = plone.recipe.varnish
daemon = ${varnish-build:location}/sbin/varnishd
bind = 127.0.0.1:3128
config = ${buildout:directory}/varnish-conf/varnish.vcl
telnet = localhost:8888
cache-size = 1G

# foreground is needed for supervisor to control varnish correctly
mode = foreground
How to add a ping url? in varnish.vcl, at the beginning of vcl_recv:
 # This url will always reply 200 whenever varnish is running
if (req.request == "GET" && req.url ~ "/varnish-ping") {
error 200 "OK";
}
For this I must admit I made a (very) quick search on the net; if anyone has a better solution please let me know!

Supervisor

[supervisor]
recipe = collective.recipe.supervisor
port = localhost:9001
user = admin
password = admin
plugins =
superlance

# solr security settings: see
# http://docs.codehaus.org/display/JETTY/Connectors+slow+to+startup
programs =
10 varnish (startsecs=10) ${buildout:directory}/bin/varnish true
20 solr (startsecs=10) java [-Djava.security.egd=file:/dev/urandom -jar start.jar] ${buildout:parts-directory}/solr true

eventlisteners =
SolrHttpOk TICK_60 ${buildout:bin-directory}/httpok [-p solr -t 20 http://localhost:8983/solr/]
VarnishHttpOk TICK_60 ${buildout:bin-directory}/httpok [-p varnish -t 20 http://localhost:3128/varnish-ping]
For programs I set "startsecs" to 10 seconds. This tells supervisor to wait 10 seconds before considering that the program is properly running. This is important if your services take a bit of time before properly serving: if an event listeners is ran and finds a failure it may ask supervisor to restart again the service (i.e. before the service could ever complete its startup).

Solr is not started with "bin/solr-instance fg", mainly because I needed to pass an aditionnal parameter (without it solr startup time was very long, from 1 to 5 min...).

The event listeners are configured to check varnish and solr every minute. They order to restart them if they fail to answer.

Supervisor Init script for Debian

[supervisord_init_script]
recipe = collective.recipe.template
input = templates/supervisord_init.in
output = ${buildout:bin-directory}/supervisord_rc
For making "templates/supervisord_init.in" I copied /etc/init.d/skeleton and edited it. Important: do "chmod +x templates/supervisord_init.in", the permission will be reported on the generated file. Here is the diff:

--- /etc/init.d/skeleton    2009-03-31 11:01:55.000000000 +0200
+++ templates/supervisord_init.in 2009-05-26 16:45:24.000000000 +0200
@@ -1,31 +1,31 @@
#! /bin/sh
### BEGIN INIT INFO
-# Provides: skeleton
+# Provides: supervisord
# Required-Start: $remote_fs
# Required-Stop: $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# Short-Description: Example initscript
+# Short-Description: initscript for supervisord at ${buildout:bin-directory}
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO

-# Author: Foo Bar
+# Author: Bertrand Mathieu
#
-# Please remove the "Author" lines above and replace them
-# with your own name if you copy and modify this script.
-
# Do NOT "set -e"

# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
-DESC="Description of the service"
-NAME=daemonexecutablename
-DAEMON=/usr/sbin/$NAME
-DAEMON_ARGS="--options args"
-PIDFILE=/var/run/$NAME.pid
+DESC="Start/Stop supervisord at ${buildout:bin-directory}"
+NAME=supervisord
+DAEMON=${buildout:bin-directory}/$NAME
+DAEMON_ARGS=""
+PIDFILE=${buildout:directory}/var/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

+# file owner will be used to run daemon
+OWNER=$(stat -c %U $DAEMON)
+
# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

@@ -48,9 +48,9 @@
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
- start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \
+ start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $OWNER --test > /dev/null \
|| return 1
- start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON -- \
+ start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $OWNER -- \
$DAEMON_ARGS \
|| return 2
# Add code here, if necessary, that waits for the process to be ready
@@ -68,7 +68,7 @@
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
- start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $NAME
+ start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --chuid $OWNER --name $NAME
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Wait for children to finish too if this is a daemon that forks
@@ -77,7 +77,7 @@
# that waits for the process to drop all resources that could be
# needed by services started subsequently. A last resort is to
# sleep for some time.
- start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
+ start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --chuid $OWNER --exec $DAEMON
[ "$?" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
rm -f $PIDFILE
@@ -93,7 +93,7 @@
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
- start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
+ start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --chuid $OWNER --name $NAME
return 0
}
Notes:
  • the daemon is run by the owner of bin/supervisord. Most of the time it is the user who has ran buildout (hopefully it is not root!)
  • I have used a bash construct to get the owner ("OWNER=$(stat -c %U $DAEMON)"), this could be changed for pure sh
  • thus bin/supervisord_rc (start | stop) can be run by this user, without the need for "sudo". Without this "solr-rebuild" could not work.
To install it in init.d:
$ cd /etc/init.d
$ sudo ln -s /path/to/buildout/bin/supervisord_rc my_preferred_service_name
$ sudo updated-rc.d my_preferred_service_name defaults

Monday, May 25, 2009

django-cachepurge 0.1a

I have released django-cachepurge 0.1a. It is available as an egg for an easy installation with buildout or virtualenv+easy_install, for example.

This package allows django to purge HTTP cache when a model instance is changed or deleted. It does this by sending asynchronous "PURGE" requests to one or more upstream HTTP cache (such as Squid or Varnish). It is inspired by Plone CacheFu components (more specifically: CMFSquidTool).

Unfortunatly Django does not have a "post_commit" signal (it would be the best place to do such a job), so purge requests are sent when response has been computed: if an exception occurs during response the urls are not purged. This is done by the middleware.

Pre-requisite: the cache must be configured to accept and handle "PURGE" requests from the server where the django application is hosted.

Configuration on django side:

  1. The application must be the first app declared in settings.INSTALLED_APP. The reason is that it listens to the class_prepared signal to connect post_save and post_delete handlers on eligible models (more on that below). If you put other app before django-cachepurge it may miss their models. Note that the package name uses an underscore.
    INSTALLED_APPS = (
    'django_cachepurge',
    ...
    )

  2. add "django_cachepurge.middleware.CachePurge" in settings.MIDDLEWARE_CLASSES
  3. define settings.CACHE_URLS to the cache root for django. CACHE_URLS can be a single string or an iterable of strings. For example:
    CACHE_URLS = 'http://127.0.0.1:3128'

How urls are found?

If the model has a get_absolute_url method, this url will be purged. Additionnaly you can define "get_purged_urls": it should return a list of urls. This is useful for "through" models used in M2M relation to invalidate url of linked contents for example. If the model has none of these methods, nothing happens (the signals are not connected).

Pypi: http://pypi.python.org/pypi/django-cachepurge/
Launchpad: http://launchpad.net/django-cachepurge/

Tuesday, February 10, 2009

Defining and accessing macros located in browser:page template

The case: I want to define a simple browser page (let's name it "mypage") for a page template where I defined some metal macros (in my case it is a template for an archetypes field). My product does not provide a skin for portal_skins, I don't want to add a layer and all the generic setup stuff just for a single template. I'm using plone 3.1.

The problem: @@mypage/macros does not work as legacy portal_skins page templates used to.

Solution: define a simple class like this:
from Products.Five import BrowserView

class MacrosView(BrowserView):

@property
def macros(self):
return self.index.macros
The ZCML for "mypage":
<browser:page
for="*"
name="mypage"
class=".macros.MacrosView"
template="mypage.pt"
allowed_attributes="macros"
permission="zope.Public" />
There could be a better, less verbose solution (like providing a meta definition for zcml, in order to avoid declaring "class" and "allowed_attributes"). We could also patch Five BrowserView.

In my case I have been able to use mypage as a template for my archetypes widget:
MyWidget(macro="@@mypage",)
Compared to legacy PT you will loose some builtins (like python: test()), but that kind of logic should be (easily) moved into a dedicated view class. This is noticeable in the case you are customizing an old template (like archetypes/widgets/file.pt ;-))

Dunno if it is the "right way of doing things", at least it worked-for-me(tm).

Wednesday, January 21, 2009

Useful script in a plone developer toolbox

Sometimes something weird happen on the production site and you have to investigate on data from that site because you can't reproduce the problem on development site. When it's really hard you have to copy the Data.fs and run a separate instance to work on it. What I'm putting here is a script that changes all users passwords to their id, and also change email property: this allows to login as anybody easily, and no mail can be sent to the actual users. All you have to do is create a "Script (Python)" in ZMI at portal root, put this code and click "test". It's not a revolution, it's not-so-good-practice(tm), it's just a convenience ;-)
mtool = context.portal_membership.aq_inner
pu = context.plone_utils.aq_inner
acl = context.acl_users.aq_inner
count = 0

for uid in acl.getUserIds():
count += 1
acl.userSetPassword(uid, uid)
member = mtool.getMemberById(uid)
pu.setMemberProperties(member, email='me.the.developper@mydomain.tld')
print uid

print
print count, "users"
return printed

Friday, January 2, 2009

Profiling made easy

Recently I had to profile some pages on a Plone 2.5 (zope 2.9). I collected some datas on interesting pages with the help of the well-known ZopeProfiler 1.7.2 but I had to patch it to avoid an error:
--- ZopeProfiler.py~ 2007-06-26 10:43:25.000000000 +0200
+++ ZopeProfiler.py 2008-12-22 18:02:26.000000000 +0100
@@ -393,10 +393,10 @@
# Five broke 'getPhysicalPath' for its view classes -- work around
try: p= gP()
except:
- _log.error("calling 'getPhysicalPath' failed for %r", s,
- exc_info=sys.exc_info()
- )
- return
+ # _log.error("calling 'getPhysicalPath' failed for %r", s,
+ # exc_info=sys.exc_info()
+ # )
+ return ('?', _Empty, fn)
if type(p) is StringType: fi= p
else: fi= '/'.join(p)
return (fi,_Empty,fn)
A few years ago we had no other option than digging in the raw stats as they come from Stats.sort_stats().print_stats(). Since then there is a new tool: Gprof2dot. The author also made something more than handy: xdot.py.

Now just add a little bash function:
$ function build_dot() { ./gprof2dot.py -f pstats -o $(basename $1 .pstats).dot $1; }
Then my workflow for profiling some pages could be faster and easier:
  1. Profile a page, and save "some_page.pstats"
  2. run "build_dot some_page.pstats"
  3. run "./xdot.py some_page.dot"
  4. visit the graph
Here is the first overview:
The mouse wheel allows to zoom in/out, holding left-click and moving the mouse will move the graph. It's quite easy to quickly find some hotspots, sometimes they will appear very obviously:

I can read: 69% of total time spent in schema copy. In this particular case I know there is just one object with a "Schema" method, so probably it would be a good idea to review the code here to reduce the number of schema copies, or thinking about adding some cache decorator if it's possible (like plone.memoize). The graph does not tell what to do, though ;-)

Another interesting hotspot (in plone 2.5): for some pages up to 15% of the time in spent in... getAllowedTypes (just 1 call - nearly 11% in pythonproducts.py __bobo_traverse__).

GenericSetup and dependence on circular dependencies problem

As of Plone 3.1.6 there is a problem with import step dependencies: if you register a custom import step through zcml, and if this step depends on "portlets", "content" or "plone-final", then your import step will be inserted before its dependencies. This is because local steps (i.e. defined in an import_steps.xml file) are listed after ZCML ones, and in its final loop GS ordering method will insert remaining steps as they comme.

The big problem is when you must to execute "mysite-final" after "portlets" for example.

There is a related ticket on plone.org, I have added a comment with a patch (and tests) for GS to deal better with this kind of dependencies. It may be useful now for someone. This ticket may not be the best place to put that, but sadly I really don't have the time to discuss it in the right mailing list.
Here is the idea: basically the final loop is modified to insert first any step involved in circular chain, and then it will try to insert remaining ones with dependency resolution. Thus "mysite-final" will always be inserted after "portlets".