If you have ever had roommates odds are that you've had one that you didn't like. Even if you got lucky and you liked all your roommates there were probably moments where homicide wasn't an option you had totally ruled out as a solution to the dirty dishes problem. And we have all heard stories about best friends that decided to room together and now they don't speak anymore.
Applications Are People Too
Well, not really; but they don't always like cohabitating like people do. Just like people, software applications have needs and when one application tramples on the others needs they tend to fight like college roomates.
Software applications needs, what we call dependencies, can vary greatly. They depend on access to system resources like CPU, memory, block I/O, and networking. They depend on the presence specific versions of other software like Java 7 or MySQL 5.5. And god help you if have two applications that both need the same version of the same package, but require different and incompatible compile time options for that package.
The package dependencies can become difficult to manage and the interactions complicate debugging if we try to deploy multiple applications to a single machine. We also have to deal with resource contention; if one of our web applications is getting lots a traffic and saturates our host's network connection our co-hosted applications performance will suffer, even though they are under normal load.
Despite all of that, it is often impractical to run every application on its own machine because of cost. So, how do we escape this hell?
The Canonical Approach
In the spring of 1546 christian scholars were gathered by the Catholic Church in Trent, Italy and charged with identifying which prophetic texts were truly inspired by god. The texts that they selected, canonized, became what we know as the modern bible. With the exception of changes due to retranslation and some denominations that have added books, the bible you'd find at a church altar has the same contents as a bible from 450 years ago.
This is probably the most common solution to this problem and many software shops use this same approach to manage their dependencies and interactions. Certain versions of software packages, tools, and languages are blessed by the Council of Senior Engineers and only those are used to build applications. While this can be effective, it only simplifies not solves the problem.
The largest shortcoming is that when something new comes out that you want to use, if its dependencies are not found in our canon, you can't. If you actually need to use this new thing then you have to tackle updating everything else to work with the new dependencies you are introducing. This can quickly spiral out of control as the dependencies are software themselves and may require newer versions of other packages that also conflict with our environment's sacred configuration.
This approach has not solved our problem, we are still having to manage the interactions between dependencies. The interactions still exist, they are just well understood and, hopefully, stable so that , unless we are upgrading something, they can be largely ignored.
So We Should Use VMs Then?
Yes and No. Virtual machines do solve this problem but they add the non trivial overhead of hardware emulation, there is better way.
What About Process VMs? They Don't Have To Emulate Hardware.
While that is true, the problem with process VMs is in their name. We are still a process running on a host platform and therefore have the same problems as any other process. We are competing for resources and can have conflicting dependencies.
We might be tempted to solve this problem by deploying multiple applications to a single instance of our process VM that uses up all of the systems resources, but then we'd realize that this has undone all of our hard work. We are, again, sharing system resources between these applications.
Never Fear Containers are Here!
Software containers, e.g. LXC, FreeBSD jails, Solaris zones, and the like, solve this problem nicely. They let us prevent resource contention by limiting each applications access to system level resources like CPU, memory, block I/O, and networking and manage dependency conflicts by isolating each applications view of everything in the execution environment from process trees to the mounted file systems.
Combining this resource isolation with copy-on-write file systems like ZFS and BTRFS gives us excellent disk and memory performance characteristics. In addition to being more space efficient because multiple containers can share the same file, until one of them changes it, caching is far simpler.
We could have had all of these things when we were running in a VM, so why are containers better? First, we are running as a native process in the host OS, no hardware emulation overhead and no hypervisor to manage. Second, containers are much more portable than VMs are because of their size. VMs are often a few gigabytes in size because they contain all the information needed simulate an entire piece of hardware. A container will be much smaller than the VM image shipping the same piece of software.
But Mom... Containers Are Hard!
Docker commoditize the use of containers as a software deployment mechanism. It abstracts away much of the underlying craziness and gives us other awesome things like container versioning, inheritance, and a way to share containers with others.
What Dependencies Am We Left With?
Because containers are kernel level primitives you are only dependent on a kernel that supports the container that you choose to use. for example, if using Docker with LXC this means any Linux distro with a kernel version of 3.8 or later.
A Brief History of Containers
Before I wrap this up a quick history lesson.
We’ve been deploying software to containers since we started deploying software and we’ve had problems with them interacting as soon as we decided to put more than one piece of software in a container but the container has changed over the years.
The first container was the bare metal, true isolation as the first computers could only execute a single program at a time. You would show up at your scheduled time to run your program and hope that it finished in the time you had reserved.
This showing up on time requirement frustrated us so we invented the operating system so that we didn’t have to show up right on time. Operating systems also came with the added bonus of allowing us to interleave the execution of programs to make the best use of our physical resources.
The operating system served us well for a while but as software became increasingly complex we started to feel the pain of having a single machine be host to multiple programs. We were in a rush to get to production, so we invented the hypervisor to simulate several virtual machines on one real machine.
This also worked well for a while but eventually we realized that simulating hardware was an awfully high price to pay for not having to do program isolation correctly. So we went back to the trusty warhorse that is the OS and gave it the proper primitives to isolate programs from each other.
Multi-Tenant Containers Are An Anti-Pattern
Putting multiple applications in the same container is an anti-pattern, regardless of what the container is, but it’s ridiculous to suggest that we go back to a piece of hardware only having a single program on it. Ignoring the ridiculous cost of doing that, most of our applications are not a single program but a complicated set of interconnected programs.
Since we can’t go back to simpler times we should accept kernel level application isolation as a best practice for software application deployment.
In short, be a good guy not a scumbag and don't give your software roommates, because roommates suck!