Multi-node Hadoop cluster on Docker Lajos Papp 19 June 2014

In the previous post you saw how easy is to create a single-node Hadoop cluster on your devbox.

Now lets raise the bar and create a multinode Hadoop cluster on Docker. Before we start, make sure you have the latest Ambari image:

docker pull sequenceiq/ambari:1.6.0


Once you have the latest image, you can start running Docker containers. But instead of typing long commands like docker run [options] image [command], we have created a couple of shell functions to help you with Docker commands.

Using these functions the impatient can provision a 3 node Hadoop cluster with this one-liner:

curl -Lo .amb && . .amb && amb-deploy-cluster

Note that you can always alter the default parameters as the blueprint, cluster size, etc … check the shell function’s head for the parameters list.

It does the following steps:

  • runs ambari-server start in a daemon Docker (background) container (and also an ambari-agent start)
  • runs n-1 daemon containers with ambari-agent start connecting to the server
  • runs AmbariShell with attached terminal (to see provision progress)
    • AmbariShell will post the built-in multi-node blueprint to /api/v1/blueprints REST API
    • AmbariShell auto-assign hosts to host_groups defined in the blueprint
    • creates a cluster, by posting to the /api/v1/clusters REST API

Custom blueprint

If you have your own blueprint, put it into a gist and you can use it from AmbariShell. First start AmbariShell:

amb-start-cluster 2

AmbariShell will wait for:

  • Ambari REST API Below you will see a happy path to create a multi node Hadoop cluster using the AmbariShell.
host list
blueprint add --url
cluster build --blueprint custom-blueprint
cluster assign --hostGroup host_group_1 --host amb0.mycorp.kom
cluster assign --hostGroup host_group_2 --host amb1.mycorp.kom
cluster assign --hostGroup host_group_2 --host amb1.mycorp.kom
cluster create

In AmbariShell the hint command will always guide you on the happy path, and remember that devops are lazy, so instead of typing press <TAB> for autocomplete or suggestions.

Autocomplete will help you to:

  • complete the command in the given context (e.g. without any blueprint, cluster commands are not available)
  • add required parameters
  • add optional parameters: press tab after double dash --<TAB>
  • complete parameter arguments, such as blueprint names, hostnames …


Ever since we started to use Docker we are always developing against a multi-node Hadoop cluster – as running a 3-4 node cluster in a laptop actually has less overhead than working on a Sandbox VM.

We are Dockerizing the Hadoop ecosystem and simplifying the provisioning process – watch this space or follow us on LinkedIn for the latest news about Cloudbreak – the open source cloud agnostic Hadoop as a Service API built on Docker.

Hope this helps and simplifies your development process – let us know how it goes for you or if you need any help with Hadoop on Docker.


Recent Posts