Tuesday, 9 April 2013

Heroku Dyno-wha??



I've been using Heroku for a few months to host rails app that serves as the backend to a iOS app I've been working on. Heroku has been awesome because it's incredible easy to deploy your web app to a production environment seamlessly.

How does all of this work behind the scenes, however? Let's start with what Heroku calls the 'basic unit of composition'.

Dynos


These are essentially each a virtualized Unix container. You execute commands against them. They start out with a default environment (Heroku's Cedar stack). When your app gets installed onto one of these (as a slug.. more on this later!) commands are executed against it based on your app and it's dependencies.

Web vs. Worker dynos

Web dynos serve web requests. If your web requests trigger things like fetching data from remote APIs or uploading data to S3, these can potentially tie up your web dyno. Worker dynos come in here. You can use a strategy that delegates these processes to a job queue and worker dynos will pick things up from this queue. 

So, how do you scale these two types of dynos?
  • Use more web dynos to support more concurrent users
  • Use more worker dynos when your job queue starts getting backed up.

All manual activity like console sessions, or rake tasks trigger a 'one-off' dyno where they run in isolation. This includes all of your 'heroku run' this or that commands? 

The Job Queue

This image is a good visualization for how it works.



Backgrounding tasks or processes is a concept. Heroku doesn't define how to implement this. With RoR, I've used Resque successfully.

If the user request that triggers the job needs to await the response, you do need to come up with a strategy for getting it to them when the process finishes. Polling to see when their job has finished is generally acceptable.

Dyno Management

What manages these different dynos and makes sure they're in sync? Heroku uses it's dyno manifold to do this. When you deploy new code, all of your app's dynos are restarted. The dyno manifold also monitors your dynos for errors or issues and restarts or moves them accordingly. I think the way the dyno manifold is implemented is one of Heroku's secrets as I haven't been able to find documentation anywhere. They do say that it coordinates your dynos, manages the programs that operate your app and generally allows you to remain hands-off in how it works.

Slug Compilation

When you git push to Heroku, the code is received by the slug compiler. This transforms your repository into a 'slug'. These are precompressed and pre-packaged copies of your application optimized for distribution by the dyno manifold. When you scale your application by increasing web or worker dynos, these slugs are distributed and expanded on each new dyno as well.

Dyno Idling

One thing you'll care about immediately after beginning to use Heroku will be the dyno idling policy. If your app has only a single web dyno running (this is the default and free option), it will idle out - irrespective of the number of worker dynos. This means that if you have no web requests for 1 hour, your app is effectively put to 'sleep' (idled). 

Subsequent requests to an idled app will result in the dyno manifold for the app being signaled to unidle or 'wake up' your dyno. This can result in a delay of up to 15 seconds... sometimes longer. Pretty annoying and incentive to increase your number of web dynos to ensure one is always there to receive a request.

Check out the Heroku documentation for dynos as well. This is where I got most of my information!