In this blog post I want to share some ideas on container orchestrations and schedulers. If you haven’t been living in a submarine last several years, you probably know about Kubernetes. It’s by far the most popular scheduler and container orchestration tool available for free.
Kubernetes was initially developed by Google, based on their internal cluster manager Borg. While Kubernetes is developed by Google, it was never used in production. They continue to develop and use Borg. Kubernetes share the same ideas and probably some codebase with Borg, but it’s not in use by Google’s for production.
As I mentioned above, Kubernetes is the most popular free cluster manager and container orchestrator out there, but it’s not the only one and IMHO not the best one. Now let’s get acquainted with two other alternatives and discuss our experience.
So which are the other two I’m referring to ?
Both Mesos and Nomad have most of the functionality of Kubernetes, plus some very essential advantages. Kubernetes is a container orchestration system, it can work only in a Dockerized infrastructure while Mesos and Nomad can act as stand alone application schedulers as well .
Developers of Mesos call it as ”Distributed systems kernel”. Sounds good , but what does this mean? First of all, imagine a single OS kernel, which is running on top of hundreds, even thousands of computers, Linux server which can utilize not only single, but also distributed clusters of servers.
Mesos runs on Master-Worker mode. You can run several master instances of Mesos with one active and several hot standby servers. It provides a single API to share resources and runs applications and containers in a single environment.
Mesos users concept of :
- Frameworks: applications running on Mesos
- Meta frameworks: Frameworks that execute other frameworks like applications
An example of meta framework is Marathon. Marathon is a manager for long running processes, the same as SystemD for Linux. You can run Marathon as a single init script and keep-alive system for a number of applications as well as use it for Docker orchestration on top of Mesos. Marathon is also a distributed system, so you can run multiple active/standby Marathon instances to achieve high-availability. Both Mesos and Marathon relies on the Zookeeper cluster, as distributed configuration and lock manager. So if you decide to run Mesos, you will have to install and maintain Apache Zookeeper cluster as well ( which is actually fairly simple).
Mesos is a source code distribution, so you will have to compile it yourself. But if you want to use packaged versions, you can use third party distributions of Mesos by Mesosphere .
Mesos is one of the most popular schedulers of the worlds biggest players. There are production installations of Mesos on top of over 20K nodes. They have a very impressive list of users.
This figure shows the typical architecture of Mesos + Marathon.
Nomad is very similar to Mesos use cases and shows similar behavior. It has a very ascetic and simple design, unlike Mesos, it’s distributed as single binary for both master and slave processes, thus, compared to Mesos, Nomad’s installation and maintenance is much simpler and easier. According to their resources, Nomad has been tested on a cluster with over 5000 servers and during these tests, it was not at its peak, so Developers of Nomad ensure, that it can run on a cluster with more than 10k nodes (By the way: 5000 servers is the documented peak of Kubernetes ). Nomad combines into single binary not only Master and Slave services , but also functionality of Marathon, plus it has a built in distributed Cron system.
Nomad does not use Zookeeper. It relies on another distributed configuration manager called Consul by HashiCorp. If you want to have a high available Nomad cluster running the Consul cluster is not mandatory but strongly recommended. Consul will take care of the distribution of configs, master failovers, etc ... Like Nomad, Consul also uses a single binary for both master and slave instances. The most typical setup of Consul is to use dedicated machines for the Consul master, or share Nomad master’s servers and install Consul workers on each worker node of a Nomad cluster. When Consul is running on a Nomad node, no additional configuration of the Nomad node is needed, it will automatically detect Consul on th localhost and connect to it .
Here is typical Nomad architecture:
And for comparison, this is how it looks on a typical Kubernetes architecture:
Both Mesos and Nomad are battle tested systems and have a number of advantages over Kubernetes :
- Both systems are designed not only for Docker orchestration, but they are also effective schedulers for a wide variety of both distributed or stand alone systems like Spark, Storm, Heron etc. The usage of Mesos and Nomad is recommended even for creating high available Nginx clusters as well.
- Both systems have a nice looking and informative builtin Web UI.
- Compared to Kubernetes, both systems are way more easy to install , configure and maintain.
- Both Mesos and Nomad have better scalability compared to Kubernetes.
As for me, both Mesos and Nomad are better solutions for DevOps, in comparison to Kubernetes. Both of these systems have some advantages and disadvantages over each other. But these are more a matter of personal choice and preference. I cannot say which one is better. Maybe Mesos wins at scale, but Nomad is simpler to maintain.
In case of OddEye, we have been successfully running our production infrastructure on both Nomad and Mesos, but eventually we chose Mesos. We have bypassed small complications of Mesos source only distributions, by creating Debian packages from Mesos source and installed it via standard apt-get utility.