Introducing django-bakery

    A set of helpers for baking out your Django site as flat files

    When Web traffic spikes and your site starts to sag, your first im­pulse might be to ar­chi­tec­ture up, to add more serv­ers, shard the data­base and cache, cache, cache. Provided you have the skill, time and money, that will get the job done.

    Lack­ing any of those three in­gredi­ents, the only guar­an­teed way to avoid a data­base crash is to not have a data­base. That sounds flip­pant, but it’s true. When faced with high traffic de­mands and little time or fund­ing, the Data Desk does ex­actly that. We save every page gen­er­ated by a data­base-backed site as a flat file and then host them all us­ing a stat­ic file ser­vice like Amazon S3.

    We call this pro­cess “bak­ing.” It’s our path to cheap­er, more stable host­ing for simple sites. We use it for pub­lish­ing elec­tion res­ults, timelines, doc­u­ments, in­ter­act­ive tables, spe­cial pro­jects and even this blog.

    The sys­tem comes with some ma­jor ad­vant­ages, like:

    1. No data­base crashes
    2. Zero serv­er con­fig­ur­a­tion and up­keep
    3. No need to op­tim­ize your app code
    4. You don’t pay to host CPUs, only band­width
    5. An off­line ad­min­is­tra­tion pan­el is more se­cure
    6. Less stress (This one can change your life)

    There are draw­backs. For one, you have to build the bakery in­to your code base. More im­port­ant, a flat site can only be so com­plex. No on­line data­base means your site is all read and no write, which means no user-gen­er­ated con­tent and no com­plex searches. Sites we host that could not be baked in­clude Map­ping L.A. and NHTSA Vehicle Com­plaints, each of which al­lows users to in­ter­act with a large and shift­ing data­set.

    So what’s the trick?

    To stream­line the pro­cess, we de­veloped an open-source Django lib­rary called django-bakery. It makes bak­ing out your site easi­er by in­teg­rat­ing the steps in­to Django’s stand­ard pro­ject lay­out.

    To try it out, the first thing you need to do is in­stall the lib­rary from PyPI, like so:

    $ pip install django-bakery

    Then edit your and its INSTALLED_APPS.


    Then add a BUILD_DIR dir­ect­ory path where the flattened site will be baked.

    import os
    ROOT_PATH = os.path.dirname(__file__)
    BUILD_DIR = os.path.join(ROOT_PATH, 'build')

    The cru­cial step is to re­fact­or your views to in­her­it our class-based views. They are de­signed to auto­mat­ic­ally flat­ten them­selves. Here is a list view and a de­tail view us­ing our sys­tem.

    from yourapp.models import DummyModel
    from bakery.views import BuildableDetailView, BuildableListView
    class DummyListView(BuildableListView):
        A list of all tables.
        queryset =
    class DummyDetailView(BuildableDetailView):
        All about one table.
        queryset =

    After you’ve con­ver­ted your views, add them to a list in where all build­able views will be stored.


    Then run the man­age­ment com­mand that will bake them out.

    $ python build

    That should cre­ate your build dir­ect­ory and flat­ten all the des­ig­nated views in­to it. You can re­view its work by fir­ing up the buildserver, which will loc­ally host your flat files in the same way the Django’s runserver hosts your data­base-driv­en pages.

    $ python buildserver

    To pub­lish the site on Amazon S3, all that’s ne­ces­sary yet is to cre­ate a buck­et. You can go to to set up an ac­count. If you need some ba­sic in­struc­tions you can find them here. Now set your buck­et name in the file:

    AWS_BUCKET_NAME = 'my-bucket'

    Next, in­stall s3cmd, a util­ity we’ll use to move files back and forth between your desktop and S3. In Ubuntu, that’s as simple as:

    $ sudo apt-get install s3cmd

    If you’re us­ing Mac or Win­dows, you’ll need to down­load this file and fol­low the in­stall­a­tion in­struc­tions you find there.

    Once it’s in­stalled, we need to con­fig­ure s3cmd with your Amazon lo­gin cre­den­tials. Go to Amazon’s se­cur­ity cre­den­tials page and get your ac­cess key and secret ac­cess key. Then, from your ter­min­al, run:

    $ s3cmd --configure

    Fi­nally, now that everything is set up, pub­lish­ing your files to S3 is as simple as:

    $ python publish

    The next level

    If your site pub­lishes a large data­base, the build-and-pub­lish routine can take a long time to run. Some­times that’s ac­cept­able, but if you’re peri­od­ic­ally mak­ing small up­dates to the site it can be frus­trat­ing to wait for the en­tire data­base to re­build every time there’s a minor edit.

    We tackle this prob­lem by hook­ing tar­geted build routines to our Django mod­els. When an ob­ject is ed­ited, the mod­el is able to re­build only those pages that ob­ject is con­nec­ted to. We ac­com­plish this with a build meth­od you can in­her­it. All that’s ne­ces­sary is that you define a list of the de­tail views con­nec­ted to an ob­ject.

    from django.db import models
    from bakery.models import BuildableModel
    class DummyModel(BuildableModel)
        detail_views = ('yourapp.views.DummyDetailView',)
        title = models.CharField(max_length=100)
        description = models.TextField()

    Now, when is called, only that ob­ject’s de­tail pages will be re­built. If oth­er pages ought to be up­dated as well, par­tic­u­larly if they come from views that don’t take the ob­ject as an in­put, you should in­clude those in the pre-defined _build_related mod­el meth­od called at the end of build.

    from django.db import models
    from bakery.models import BuildableModel
    class DummyModel(BuildableModel)
        detail_views = ('yourapp.views.DummyDetailView',)
        title = models.CharField(max_length=100)
        description = models.TextField()
        def _build_related(self):
            Rebuild the sitemap and RSS feed as part of the build routine.
            import views

    With this sys­tem in place, a up­date pos­ted to the data­base by an entrant us­ing the Django ad­min can set in­to mo­tion a small build that is then synced with your live site on Amazon S3. We use that sys­tem to host ap­plic­a­tions with in-house Django ad­min­is­tra­tion pan­els that, for the entrant, walk and talk like a live data­base, but then auto­mat­ic­ally fig­ure out how to serve them­selves on the Web as flat files. That’s how a site like is man­aged.

    Fi­nally, to speed the pro­cess a bit more, we hand off the build from the user’s save re­quest in the ad­min to a job serv­er that does the work in the back­ground. This pre­vents a push-but­ton save in the ad­min from hav­ing to wait for the en­tire build to com­plete be­fore re­turn­ing a re­sponse. Here is the save over­ride on the Timelinemod­el that as­sesses wheth­er the pub­lic­a­tion status of an ob­ject has changed, and then passes off build in­struc­tions to a Cel­ery job serv­er.

    def save(self, *args, **kwargs):
        A custom save that bake the page and republishes it when necessary.
        logger.debug("Saving %s" % self)
        # if has been passed, we skip everything.
        if not kwargs.pop('build', True):
            super(Timeline, self).save(*args, **kwargs)
            # First figure out what we're going to have to do after we save.
            # If the timeline has not yet been created...
            if not
                if self.is_published:
                    action = 'publish'
                    action = None
                current = Timeline.objects.get(
                # If it's been unpublished...
                if not self.is_published and current.is_published:
                    action = 'unpublish'
                # If it's being published...
                elif self.is_published:
                    action = 'publish'
                # If it's remaining unpublished...
                    action = None
            super(Timeline, self).save(*args, **kwargs)
            logger.debug("Post-save action: %s" % action)
            # Do whatever needs to be done
            if action:
                if action == 'publish':
                elif action == 'unpublish':

    The tasks don’t have to be com­plic­ated. Ours are as simple as:

    import sys
    import logging
    from django.conf import settings
    from celery.decorators import task
    from django.core import management
    logger = logging.getLogger('timelines.tasks')
    def publish(obj):
        Bake all pages related to a timeline, and then sync with S3.
            # Here the object is built
            # And if the settings allow publication from this environment...
            if settings.PUBLISH:
                # ... the publish command is called to sync with S3.
        except Exception, exc:
            logger.error("Task Error: publish",
                    'status_code': 500,
                    'request': None
    def unpublish(obj):
        Unbake all pages related to a timeline, and then sync to S3.
            if settings.PUBLISH:
        except Exception, exc:
            logger.error("Task Error: unpublish",
                    'status_code': 500,
                    'request': None

    And that’s it. These tools have already proven use­ful for us, but have only been sketched out. All of the code is free and open on Git­Hub and any con­tri­bu­tions or ad­vice is wel­come.

    Much re­spect due

    This ap­plic­a­tion was made in close col­lab­or­a­tion with Ken Schwencke, my part­ner here at the Data Desk. Without his ima­gin­a­tion, cri­ti­cism and con­tri­bu­tions, most of our work would be im­possible, in­clud­ing this lib­rary.

    Also, I presen­ted the ideas be­hind django-bakery last month to a group of our peers at the 2012 con­fer­ence of the Na­tion­al In­sti­tute for Com­puter-As­sisted Re­port­ing. The NICAR com­munity is a con­stant source of chal­lenge and in­spir­a­tion. Many of our ideas, here and else­where, have been ad­ap­ted from things the com­munity has taught us.

