DevOps Django - Part 3 - The Heroku Way
In the previous installment of this series, I discussed (in depth) the problems with deploying Django as a devops guy. After struggling with deployment for ~2 years, and finding very little relief in modern devops tools (puppet, monit, nagios, etc.), I started looking for new solutions.
Several months ago I was reading Hacker News and noticed that Heroku had recently added Python support to their platform-as-a-service stack. If you're not familiar with Heroku, they're a very popular ployglot hosting platform. They've been around since 2007 providing Ruby hosting, but over the past year they've added support for multiple languages, and seem to be kicking ass and growing like mad.
I've continuously heard their name mentioned by other programmers, but this was the first I'd heard of them having any Python support, so I bookmarked their Django tutorial and told myself I'd give it a spin sometime.
Coincidentally, a few weeks later I was asked to give an impromptu lightning talk during apyladies hack-a-thon, so I decided to give Heroku a spin, and do my lightning talk on that. In a weird twist of fate, the few hours I spent learning Heroku turned out to be some of the best invested hours of my life.
During the hack-a-thon I built and deployed a simple Django site (with celery support) in just under an hour. If I were to build the same site and attempt to deploy it on EC2, I'd have easily spent a week getting things deployed using puppet, monit, nagios, etc.
After the hack-a-thon ended, I decided to play around with Heroku a bit more, and see what the platform really had to offer other than what I used during my short sprint.
A Second Look
Later that week, I had a few hours to kill, so I revisited Heroku's website, and read through all their help resources and tutorials. I also took an in-depth look at their addons, trying to decide what their services could be used for: personal projets, business ideas, work projects?
The first thing I was blown away to discover is that Heroku has an enormous amount of addons. They literally have addons for almost any piece of infrastructure you could ever want:PostgreSQL, Redis, cron, memcached, RabbitMQ, Solr, etc. Upon seeing this, I started to get excited.
Second, I took the opportunity to play around with their extremely easy to install and use CLI tool, and was blown away again. Heroku's CLI tool gives you complete control over your Heroku applications. You're able to create new Heroku applications out of Git repositories, instantly provision addons for your application (PostgreSQL, Redis, etc.), view streaming logfiles, instantly scale up (and down) your nodes, etc. Furthermore, you're able to run shell commands locally using their CLI tool. Need to access the Django shell? No problem, you can simply execute ``heroku run python manage.py shell`` from your terminal.
By this point, I was really itching to use Heroku for something serious.
While I was able to build and deploy a simple Django site on Heroku in under an hour during the hack-a-thon, I knew that I needed to port a much larger, more complex site over to Heroku to really see if it could meet my professional needs.
So, I checked out a fresh copy of the teleconferencing service that I develop at work, and got to it.
How I Ported My Work Application to Heroku
If you'd like to see details about what technologies the teleconferencing service uses, check out the previous part of this series. As the teleconferencing service is large, complex, and infrastructure heavy--I figured that if I could port it to Heroku then I'd be able to use Heroku for almost anything.
The first thing I did was create a new Heroku application using their CLI tool: ``heroku create --stack cedar``, then deploy my application to Heroku using Git: ``git push heroku master``.
Secondly, I installed a few addons so I could use my required infrastructure components (RabbitMQ, memcached, PostgreSQL, and cron):
- ``heroku addons:add rabbitmq``
- ``heroku addons:add memcache``
- ``heroku addons:add shared-database``
- ``heroku addons:add scheduler``
NOTE: By leaving off the size of the addons at the end, you install the cheapest (smallest) plan for each service. By adding these addons just as I've shown above, it adds 0$ per month to your bill.
Next, I modified my production settings file (``project/settings/prod.py``, in my case), to work with Heroku's hosted RabbitMQ, memcached, and PostgreSQL services. In the end, my settings file looked something like this:
As quick note, everytime you push code to Heroku, they automatically read your top-level ``requirements.txt`` file, and install any packages you've defined. This makes handling site dependencies completely transparent to you (the developer). In order to get my site working on Heroku, the only change I had to make to my requirements file was adding ``django-pylibmc-sasl``, as the Heroku memcached addon requires SASL authentication, which the commonly used ``python-memcached`` library doesn't provide.
Just for clarity, here's my ``requirements.txt`` file:
As you can see in the settings file, Django is simply using environment variables to interact with the various infrastructure components we added earlier. All of Heroku's addons work by exporting one (or more) environment variables that your application can use. This makes using Heroku addons extremely convenient. You can reuse most of your Django settings across all your projets, since you're storing the important stuff in environment variables. No coupling required.
Additionally, the Heroku CLI tool also allows you to set, edit, and remove your own environment variables. I used this functionality to set my application's ``SECRET_KEY``, along with other arbitrary stuff (like my Amazon S3 credentials, etc.). To add these environment variables to my Heroku application, I simply ran: ``heroku config:add SECRET_KEY=xxx AWS_ACCESS_KEY_ID=xxx ...``.
The next thing I did was define a top-level file, ``Procfile``, which Heroku uses to specify the different types of 'dynos' you'll be running. Essentially, a Procfile just lists a series of executable commands that do stuff. Here's the Procfile I wrote:
Each line contains a reference name, followed by the command you want to run. In my case, I've got three separate 'dyno' types: a gunicorn instance (this is my actual Django web application), a celerybeat instance (for scheduling periodic tasks), and a celery worker instance (for processing asynchronous tasks).
The way Heroku runs and manages your application is via these dyno types. For instance, let's say I want to have Heroku run three 'web' instances (this is the equivalent of running three separate web servers), I could run: ``heroku scale web=3`` from the CLI, and Heroku would instantly ensure that three of my 'web' dynos are running--automatically load balancing incoming HTTP requests across the three.
NOTE: You can run ``heroku ps`` to see what processes (and how many of each) are running.
After defining my ``Procfile``, I just ran ``heroku scale web=1 scheduler=1 worker=1``, and Heroku instantly spun up my entire cloud infrastructure.
INSANE
NOTE: If you are running celerybeat, be sure to only run it a single time, and no more. If you have multiple celerybeat instances running, you'll have duplicate tasks in your queue. That's why I specifically created two separate celery dyno types, "scheduler" and "worker", so that I could safely scale my worker processes using: ``heroku scale worker=x``, while keeping only a single scheduler.
A Quick Recap
It took me a total of ~2 hours to:
- Make a large website and API completely Heroku compatible.
- Fully provision a PostgreSQL server.
- Fully provision a memcached server.
- Fully provision a RabbitMQ server.
And don't forget--this is the amount of time it took me while still learning about Heroku, and reading through the documentation. I think it is fair to say that if there were more tutorials, articles, and information available, this process could be shortened substantially.
I'll repeat this again: that is INSANE.
Compared to the time, effort, and maintenance required to setup my teleconferencing application normally using a standard devops toolset, this was a piece of cake. I didn't have to provision a puppet server, manage a puppet repo, or anything.
Heroku makes deployment so easy, you don't even need a fabfile.
The Heroku Way
Through the process of learning, using, and eventually moving my company's entire infrastructure over to Heroku, I learned quite a bit about Heroku's ideals.
In my first article discussing the "deployment problem" in the Django community, I said that I have a dream for the Django community; a dream that all Django developers can build their websites with peace of mind, knowing a production deployment is no more than 10 minutes away.
Heroku is quickly making my dream a reality.
The "Heroku Way", as I refer to it now, consists of a few simple principles:
- Build your Django sites using standard open source tools.
- Use Git to manage your deployment revisions.
- Instantly provision, resize, or remove any infrastructure components you need.
- Keep your private data decoupled from your codebase via environment variables.
- Instantly scale your infrastructure up and down as you please.
- Pay by usage.
After having tried numerous deployment methods myself, and after using Heroku, I strongly believe that Heroku's approach to solving the deployment problem is the right way. As a developer, using Heroku is a big win:
- You can deploy both small and large sites the same way (and scale them as large as you'd like with no added complexity).
- It's cost effective (Heroku's prices aren't much more expensive than Amazon's, which Heroku is built on top of). I'll go into more depth on this in my next article.
- You aren't locked into anything: the same codebase you use to develop your site locally can be used on Heroku in a matter of minutes.
Deploying code to Heroku feels right. As a developer, you shouldn't have to spend 90% of your time writing puppet rules and working with vendor APIs to provision servers. You shouldn't need two weeks to scale your site when you get an influx of users.
Instead, you should be coding--which is precisely what Heroku is designed to help you do.
In the next part of this series, I'll be detailing my decision to move from Rackspace to Heroku. Making an infrastructure move is a big decision. In the next article I'll discuss my reasoning with you.
PS: If you read this far--I'd like to ask you a question. Would you be interested in a book about Django deployments? If so, what sort of topics would you expect to be covered?
UPDATE: I've finished the next part of the series! You can read it here.