Cloudbreak welcomes Periscope Richard Doktorics 12 December 2014

Today we have pushed out a new release of Cloudbreak – our Docker container based and cloud agnostic Hadoop as a Service solution – containing a few major changes. While there are many significant changes (both functional and architectural) in this blog post we’d like to describe one of the most expected one – the autoscaling of Hadoop clusters.

Just to quickly recap, Cloudbreak allows you to provision clusters – full stacks – in all major cloud providers using a unified API, UI or CLI/shell. Currently we support provisioning of clusters in AWS, Google Cloud, Azure and OpenStack (in private beta) – new cloud providers can be added quite easily (as everything runs in Docker) using our SDK.

Periscope allows you to configure SLA policies for your Hadoop cluster and scale up or down on demand. You are able to set alarms and notifications for different metrics like pending containers, lost nodes or memory usage, etc and set SLA scaling policies based on these alarms.

Today’s release made available the integration between the two projects (they work independently as well) and allows subscribers to enable autoscaling for their already deployed or newly created Hadoop cluster.

We would like to guide you through the UI and help you to set up an autoscaling Hadoop cluster.

Using Periscope

Once you have created your Hadoop clusters with Cloudbreak you will now how the option to configure autoscaling policies.

In order to configure autoscaling for your cluster you should go to autoscaling SLA policies tab and hit the enable button.

Alarms

Periscope allows you to configure two types of alarms.

Metric based alarms are alarms based on different YARN metrics. A plugin mechanism will be available in case you’d like to plug your own metrics. As a quick note, we have another project called Baywatch where we collect around 400 Hadoop metrics – and those will be all pluggable in Periscope.

  • alarm name – name of the alarm
  • description – description of the alarm
  • metrics – currently the default YARN metrics we support are: pending containers, pending applications, lost nodes, unhealthy nodes and global resources
  • period – the time that the metric has to be sustained in order for an alarm to be triggered
  • notification email (optional) – address where Periscope sends an email in case the alarm is triggered

Time based alarms allow autoscaling of clusters based on the configured time. We have blogged about this new feature recently – with this new release of Cloudbreak this feature is available through UI as well.

  • alarm name – name of the alarm
  • description – description of the alarm
  • time zone – the timezone for the cron expression
  • cron expression – the cron expression
  • notification email (optional) – address where Periscope sends an email in case the alarm is triggered

Scaling policies

Once you have an alarm you can configure scaling policies based on it. Scaling policies defines the actions you’d like Periscope to take in case of a triggered alarm.

  • policy name – the name of the SLA scaling policy
  • scaling adjustment – the adjustment counted in nodes, percentage or exact numbers of cluster nodes
  • host group – the autoscaled Ambari hostgroup
  • alarm – the configured alarm

Cluster scaling configurations

A cluster has a default configuration which Periscope scaling policies can’t override. This is due to avoid over or under scaling a Hadoop cluster with policies and also to definde a cooldown time period between two scaling actions.

  • cooldown time – the time spent between two scaling actions
  • cluster size min. – the minimum size (in nodes) of a cluster
  • cluster size max. – the maximum size (in nodes) of a cluster

It’s that simple. Happy autoscaling.

In case you’d like to test autoscaling and generate some load on your cluster you can use these stock Hadoop examples and the scripts below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash

export HADOOP_LIBS=/usr/lib/hadoop-mapreduce
export JAR_JOBCLIENT=$HADOOP_LIBS/hadoop-mapreduce-client-jobclient-2.4.0.2.1.2.0-402-tests.jar

smalljobs(){
  echo "############################################"
  echo Running smalljobs tests..
  echo "############################################"

  CMD="hadoop jar $JAR_JOBCLIENT mrbench -baseDir /user/hrt_qa/smallJobsBenchmark -numRuns 2 -maps 10 -reduces 5 -inputLines 10 -inputType ascending"
  echo TEST 1: $CMD
  su hdfs -c "$CMD" 1> smalljobs-time.log 2> smalljobs.log
}

smalljobs

To test it you can run it with the following script:

1
2
3
4
5
6
#!/bin/bash

for i in {1..10}
do
nohup /test.sh &
done

Make sure you check back soon to our blog or follow us on LinkedIn, Twitter or Facebook.

Comments

Recent Posts