At SequenceIQ we use Apache Ambari for provisioning, managing, and monitoring Apache Hadoop clusters on different environments. However Ambari has more useful features than these – especially for us who automate and frequently build on-demand Hadoop clusters in cloud environments and submit different applications into. These Hadoop clusters carry different components, configurations and services – think of dev->test->UAT->PROD cluster lifecycles, different settings, SLA’s, etc).
Configuration of applications that use dynamically built YARN clusters can be challenging. This is due to the huge amount of configuration properties, some of which needs to be kept in sync on YARN client application side. Think of yarn.resourcemanager.address, fs.defaultFS, yarn.resourcemanager.scheduler.address to name a few. Each time these cluster specific entries change, client applications needs to be reconfigured. Those who ever played with clusters where the default properties are overridden know what this means…
At SequenceIQ we use Ambari for building on-demand YARN clusters (see the related blog post). In our case Ambari not only maintains the configuration of the cluster it manages but also provides access to them through a set of REST resources.
To overcome the configuration maintenance problem in YARN client applications, we implemented an Ambari REST client application that embedded in client applications can dynamically retrieve configuration from an Ambari instance. Thus the only thing needed for an application to have the proper configuration is the access to the Ambari instance.
Here is a short example on how to make use of the Ambari client in an arbitrary application:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Note: Apart from the
getServiceConfigMap() method you’ll find a few interesting and useful operations
You can get the Ambari client code from the SequenceIQ GitHub repository – clone it, build it and add it as a dependency to your project.
If you’d like to play with a real multi-node Ambari managed cluster check out this older blog post – this will set you up with a Hadoop cluster in less than 2 minutes / one-click.