In the previous weeks many of you often asked us how to run our Apache Spark Docker container on a multi node cluster or how to install Spark and use it with Cloudbreak. Cloudbreak uses Ambari (1.7) blueprints to provision multi node HDP clusters (on different cloud providers: AWS, Google Cloud, Azure, Openstack – with Rackspace and HP Helion coming soon).
In this post we’d like to help you with installing Spark on Cloudbreak in a quick and easy way.
First of all you will have to create a cluster using Cloudbreak on your favorite cloud provider – Google Cloud, AWS, Azure or Openstack (check this post) using a simple
multi-node-hdfs-yarn blueprint. After your cluster is ready, you can install Apache Spark with the following steps:
Install from the cloud instance
First, you need to enter to one of your cloud instances. Then use the one-liner below:
After the file is downloaded it will be sourced, then you can use the following command:
Alternatively you can install it without uploading the Spark assembly
uberjar using :
After it is done, enter into the ambari-agent container:
Apache Spark will be installed at /usr/local/spark in the container. If you want to try it you need to configure a few environment variables such as YARN_CONF_DIR or SPARK_JAR (see the Install from container option)
Install from container
If you entered in one of your cloud instances, enter into the ambari-agent container: (same as you seen above):
Inside the container use the following command:
Then you can install spark with “install-spark <install/install-local>” command:
With this approach you do not need to set up your environment variables. The script will do it for you.
- You need to install Spark into every node (you can use it only on 1, but it is not the best approach).
- You do not want to enter docker container or even cloud instances to do things like this.
As you see until Ambari will not fully support Spark installation with Blueprints this is not an ideal situation.
Nevertheless we understood this and with the introduction of recipes in the latest Cloudbreak release we are going to publish a new Cloudbreal Spark recipe next week. In the meanwhile stay tuned as we are publishing a post early next week about the concept and architecture of
recipes, how to use it and will publish a few custom ones (by request).