Last week we have announced and open sourced Periscope – the industry’s first SLA policy based autoscaling API for Hadoop YARN clusters. In this post we’d like to come up with some examples, setting up alarms and attach scaling policies to your Hadoop cluster.
Periscope is built on existing (and coming/contributed by us) features provided by Apache Hadoop, YARN, Ambari, Docker containers and SequenceIQ’s Cloudbreak. Just FYI, Cloudbreak is our open source and cloud agnostic Hadoop as a Service API, built on Docker containers. While Periscope can attach scaling policies to
dynamic clusters – in this post we’d like to emphasize Periscope’s capabilities while working with — `dynamic – cloud based Hadoop deployments – such as Hadoop clusters deployed with Cloudbreak.
SLAs policies are configured based on
alarms, whereas an alarm is created based on
metrics – these entities are explained below.
An alarm watches a
metric over a specified time period, and used by one or more action or scaling policy based on the value of the metric relative to a given threshold over the time period. A few of the supported
metrics are listed below:
PENDING_CONTAINERS– pending YARN containers
PENDING_APPLICATIONS – pending/queued YARN applications
LOST_NODES – cluster nodes lost
UNHEALTHY_NODES – unhealthy cluster nodes
GLOBAL_RESOURCES – global resources
metrics are compared with pre-configured values using operators. The
comparison operators are:
In order to avoid reacting for sudden spikes in the system and apply policies only in case of a sustained system stress,
alarms have to be sustained over a
period of time. The
period specifies the time period in minutes during the alarm has to be sustained. Also a
threshold can be configured, which specifies the variance applied by the operator for the selected
alarm related REST operations you can check the API documentation. Alarms can issue
notifications as well – for example if a metric is reached for the configured time and threshold a notification event is raised – in the given example below this notification is an email.
1 2 3
SLA scaling policies
Scaling is the ability to increase or decrease the capacity of the Hadoop cluster or application. When scaling policies are used, the capacity is automatically increased or decreased according to the conditions defined.
Periscope will do the heavy lifting and based on the alarms and the scaling policy linked to them it executes the associated policy. By default a fully configured and running Cloudbreak cluster contains no SLA policies. An SLA scaling policy can contain multiple
As an alarm is triggered a
scalingAdjustment is applied, however to keep the cluster size within boundaries a
maxSize is attached to the cluster – thus a scaling policy can never over or undersize a cluster. Also in order to avoid stressing the cluster we have introduced a
cooldown period (minutes) – though an alarm is raised and there is an associated scaling policy, the system will not apply the policy within the configured timeframe. In an SLA scaling policy the triggered policies are applied in order.
Hosts can be added or removed from specific
hostgroups. Periscope and Cloudbreak uses Apache Ambari to provision a Hadoop cluster. Ambari host groups are a set of machines with the same Hadoop “components” installed. You can set up a cluster having different hostgroups – and run different services, thus having a heterogenous cluster.
In the following example we downscale a cluster when the unused resources are high.
policy related REST operations you can check the API documentation.