Friday, May 29, 2009

Installing Django, Solr, Varnish and Supervisord with Buildout

Here I'll detail my buildout configuration for an install of Django, Solr (http search server, ), Varnish (http cache), and supervisord for controling solr and varnish. I'll show how to get a Debian init script for supervisord (of course instructions are valid for Ubuntu). I'll detail parts of base.cfg for each service, and I'll try to explain what and why.

This may not be the best way to do that but at least it works for me(tm), so I think it deserves to be shared.

My buildout doesn't handle (yet?!) the apache configuration, so I will not cover this. For the curious here is the intended http chain:
  • Apache listens on port 80 and forwards requests to Varnish on port 3128. Unlike a typical Zope setup I don't need rewrite rules (here), simple proxying is enough
  • Varnish reaches the backend (django) on port 8000
  • Apache listens on port 8000, and serves Django with wsgi. Hopefully it serves only localhost.
  • django may query solr on port 8983

Buildout files organisation

In my buildout directory I have:
  • base.cfg: this file contains the core configuration. Specific settings (for developpement, production...) are made in files that extends base.cfg.
  • templates/: this directory contains file templates used in my buildout, for example I put here the template of the supervisord init script
  • varnish-conf/, solr-conf/: I'm versionning configuration for theses services, since the configurations generated by the recipes needed adjustements
Here is the "buildout" part in base.cfg:
[buildout]
newest = false
newest: by default I don't want to check if eggs can be updated
versions = versions
I want to have an exact version for a given egg, It will be declared in "versions" section. For example one can set "my.app = 1.0.3" (If you are developping "my.app" you can unset this in dev.cfg by declaring "my.app = ")
parts =
svn-products
django
solr-files
solr
solr-conf
varnish-build
varnish
supervisor
supervisord_init_script
parts are installed in order. I'll detail them in time.
find-links =
http://dist.repoze.org/
A distribution of PIL can be found here (it is poorly referenced at pypi). Also if you cannot upload "my.app" at pypi (customer project anyone?) and you don't have an egg server you can put egg tarballs of "my.app" at a local web server and put the link in "find-links".
eggs =
PIL
lxml
psycopg2
django-extensions
django-cachepurge
Werkzeug
my.app

[versions]
djangorecipe = 0.17.4
django-extensions = 0.4
Werkzeug = 0.5

Django

Parts related to django are "svn-products" and "django". "svn-products" allows me to get solango (not yet egg released :-( ).
[svn-products]
recipe = iw.recipe.subversion
urls =
http://django-solr-search.googlecode.com/svn/trunk/solango solango
The django part. Note that I'm using django 1.0.2, since 1.1 is not yet released final. This is a matter of choice. Django 1.0.2 is available as an egg, but yet the recipe doesn't use this and downloads itself django.
[django]
recipe = djangorecipe
version = 1.0.2
control-script = django
wsgi = true
projectegg = my.app
eggs = ${buildout:eggs}
extra-paths = ${svn-products:location}

Solr

solr installation is made of 4 parts:
  • solr-files: download and unpack solr distribution:
    [solr-files]
    recipe = hexagonit.recipe.download
    url = ftp://mir1.ovh.net/ftp.apache.org/dist/lucene/solr/1.3.0/apache-solr-1.3.0.tgz
    md5sum = 23774b077598c6440d69016fed5cc810
    strip-top-level-dir = true
  • solr: creates a runable instance of solr
    [solr]
    recipe = collective.recipe.solrinstance
    solr-location = ${buildout:parts-directory}/solr-files
    host = localhost
    port = 8983

    unique-key = uniqueID
    default-search-field = text

    index =
    name:uniqueID type:string indexed:true stored:true required:true
    name:text type:string indexed:true stored:true required:false omitnorms:false multivalued:true
  • solr-conf: I have added this to overwrite some config files in solr instance directory
    [solr-conf]
    recipe = iw.recipe.cmd
    on_install = true
    on_update = true
    cmds =
    cp -v ${buildout:directory}/solr-conf/jetty.xml ${solr:jetty-destination}
    cp -v ${buildout:directory}/solr-conf/schema.xml ${solr:schema-destination}
    cp -v ${buildout:directory}/solr-conf/stopwords_fr.txt ${solr:schema-destination}
    Why? because:
    • for jetty.xml I made solr listen only on localhost, this was not by default. If you choose to customize jetty.xml you must change absolute paths by relative ones. For example for "RequestLog", the path must be changed to: "../../var/solr/log/jetty-yyyy_mm_dd.request.log"
    • For schema.xml it is a bit different. The first times I have let the recipe generate it, but solango offers to output fields definitions from you application. Thus there is no reason to maintain them in buildout (in "solr" part). The command is:
    bin/django solr --fields --path=/tmp
    Then update schema.xml with the output.

  • solr-rebuild: "command" for reindexing django content (clear & rebuild)
    [solr-rebuild]
    recipe = iw.recipe.cmd
    on_install = true
    on_update = true

    # since solr is not started by solr-instance but supervisord, solr-instance has
    # no pid file and thinks that solr is down. Thus we must run it with
    # solr-instance to be able to "solr-instance purge"
    cmds =
    ${buildout:bin-directory}/supervisorctl stop solr
    cp -v ${buildout:directory}/solr-conf/schema.xml ${solr:schema-destination}
    ${buildout:bin-directory}/solr-instance start
    COUNT=15; echo "Waiting $COUNT s"; sleep $COUNT
    ${buildout:bin-directory}/solr-instance purge
    time ${buildout:bin-directory}/${django:control-script} solr --reindex --batch-size 100
    ${buildout:bin-directory}/solr-instance stop
    ${buildout:bin-directory}/supervisorctl start solr
    Actually I could have made a template of a shell script with collective.recipe.template, and I'll probably change for that solution; I made this quickly and I didn't know yet about the possibilities of the template recipe. Right now to rebuild solr-index I have to type:
    $ bin/buildout install solr-rebuild
    Note that solr-rebuild part is not listed in buildout:parts, because I don't want to run it by default.

Varnish

Nothing really advanced here. I have just customized varnish configuration to change a few things, and to add a ping url (important for supervisord).
[varnish-build]
recipe = zc.recipe.cmmi
url = http://downloads.sourceforge.net/varnish/varnish-2.0.4.tar.gz

[varnish]
recipe = plone.recipe.varnish
daemon = ${varnish-build:location}/sbin/varnishd
bind = 127.0.0.1:3128
config = ${buildout:directory}/varnish-conf/varnish.vcl
telnet = localhost:8888
cache-size = 1G

# foreground is needed for supervisor to control varnish correctly
mode = foreground
How to add a ping url? in varnish.vcl, at the beginning of vcl_recv:
 # This url will always reply 200 whenever varnish is running
if (req.request == "GET" && req.url ~ "/varnish-ping") {
error 200 "OK";
}
For this I must admit I made a (very) quick search on the net; if anyone has a better solution please let me know!

Supervisor

[supervisor]
recipe = collective.recipe.supervisor
port = localhost:9001
user = admin
password = admin
plugins =
superlance

# solr security settings: see
# http://docs.codehaus.org/display/JETTY/Connectors+slow+to+startup
programs =
10 varnish (startsecs=10) ${buildout:directory}/bin/varnish true
20 solr (startsecs=10) java [-Djava.security.egd=file:/dev/urandom -jar start.jar] ${buildout:parts-directory}/solr true

eventlisteners =
SolrHttpOk TICK_60 ${buildout:bin-directory}/httpok [-p solr -t 20 http://localhost:8983/solr/]
VarnishHttpOk TICK_60 ${buildout:bin-directory}/httpok [-p varnish -t 20 http://localhost:3128/varnish-ping]
For programs I set "startsecs" to 10 seconds. This tells supervisor to wait 10 seconds before considering that the program is properly running. This is important if your services take a bit of time before properly serving: if an event listeners is ran and finds a failure it may ask supervisor to restart again the service (i.e. before the service could ever complete its startup).

Solr is not started with "bin/solr-instance fg", mainly because I needed to pass an aditionnal parameter (without it solr startup time was very long, from 1 to 5 min...).

The event listeners are configured to check varnish and solr every minute. They order to restart them if they fail to answer.

Supervisor Init script for Debian

[supervisord_init_script]
recipe = collective.recipe.template
input = templates/supervisord_init.in
output = ${buildout:bin-directory}/supervisord_rc
For making "templates/supervisord_init.in" I copied /etc/init.d/skeleton and edited it. Important: do "chmod +x templates/supervisord_init.in", the permission will be reported on the generated file. Here is the diff:

--- /etc/init.d/skeleton    2009-03-31 11:01:55.000000000 +0200
+++ templates/supervisord_init.in 2009-05-26 16:45:24.000000000 +0200
@@ -1,31 +1,31 @@
#! /bin/sh
### BEGIN INIT INFO
-# Provides: skeleton
+# Provides: supervisord
# Required-Start: $remote_fs
# Required-Stop: $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# Short-Description: Example initscript
+# Short-Description: initscript for supervisord at ${buildout:bin-directory}
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO

-# Author: Foo Bar
+# Author: Bertrand Mathieu
#
-# Please remove the "Author" lines above and replace them
-# with your own name if you copy and modify this script.
-
# Do NOT "set -e"

# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
-DESC="Description of the service"
-NAME=daemonexecutablename
-DAEMON=/usr/sbin/$NAME
-DAEMON_ARGS="--options args"
-PIDFILE=/var/run/$NAME.pid
+DESC="Start/Stop supervisord at ${buildout:bin-directory}"
+NAME=supervisord
+DAEMON=${buildout:bin-directory}/$NAME
+DAEMON_ARGS=""
+PIDFILE=${buildout:directory}/var/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

+# file owner will be used to run daemon
+OWNER=$(stat -c %U $DAEMON)
+
# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

@@ -48,9 +48,9 @@
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
- start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \
+ start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $OWNER --test > /dev/null \
|| return 1
- start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON -- \
+ start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --chuid $OWNER -- \
$DAEMON_ARGS \
|| return 2
# Add code here, if necessary, that waits for the process to be ready
@@ -68,7 +68,7 @@
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
- start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $NAME
+ start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --chuid $OWNER --name $NAME
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Wait for children to finish too if this is a daemon that forks
@@ -77,7 +77,7 @@
# that waits for the process to drop all resources that could be
# needed by services started subsequently. A last resort is to
# sleep for some time.
- start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
+ start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --chuid $OWNER --exec $DAEMON
[ "$?" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
rm -f $PIDFILE
@@ -93,7 +93,7 @@
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
- start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
+ start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --chuid $OWNER --name $NAME
return 0
}
Notes:
  • the daemon is run by the owner of bin/supervisord. Most of the time it is the user who has ran buildout (hopefully it is not root!)
  • I have used a bash construct to get the owner ("OWNER=$(stat -c %U $DAEMON)"), this could be changed for pure sh
  • thus bin/supervisord_rc (start | stop) can be run by this user, without the need for "sudo". Without this "solr-rebuild" could not work.
To install it in init.d:
$ cd /etc/init.d
$ sudo ln -s /path/to/buildout/bin/supervisord_rc my_preferred_service_name
$ sudo updated-rc.d my_preferred_service_name defaults

Monday, May 25, 2009

django-cachepurge 0.1a

I have released django-cachepurge 0.1a. It is available as an egg for an easy installation with buildout or virtualenv+easy_install, for example.

This package allows django to purge HTTP cache when a model instance is changed or deleted. It does this by sending asynchronous "PURGE" requests to one or more upstream HTTP cache (such as Squid or Varnish). It is inspired by Plone CacheFu components (more specifically: CMFSquidTool).

Unfortunatly Django does not have a "post_commit" signal (it would be the best place to do such a job), so purge requests are sent when response has been computed: if an exception occurs during response the urls are not purged. This is done by the middleware.

Pre-requisite: the cache must be configured to accept and handle "PURGE" requests from the server where the django application is hosted.

Configuration on django side:

  1. The application must be the first app declared in settings.INSTALLED_APP. The reason is that it listens to the class_prepared signal to connect post_save and post_delete handlers on eligible models (more on that below). If you put other app before django-cachepurge it may miss their models. Note that the package name uses an underscore.
    INSTALLED_APPS = (
    'django_cachepurge',
    ...
    )

  2. add "django_cachepurge.middleware.CachePurge" in settings.MIDDLEWARE_CLASSES
  3. define settings.CACHE_URLS to the cache root for django. CACHE_URLS can be a single string or an iterable of strings. For example:
    CACHE_URLS = 'http://127.0.0.1:3128'

How urls are found?

If the model has a get_absolute_url method, this url will be purged. Additionnaly you can define "get_purged_urls": it should return a list of urls. This is useful for "through" models used in M2M relation to invalidate url of linked contents for example. If the model has none of these methods, nothing happens (the signals are not connected).

Pypi: http://pypi.python.org/pypi/django-cachepurge/
Launchpad: http://launchpad.net/django-cachepurge/