As you might be already familiar, we have
dockerized most of the Hadoop ecosystem – we are running MR2, Spark, Storm, Hive, HBase, Pig, Oozie, Drill etc in Docker containers – on bare metal and in the cloud as well. For details you can check these older posts/resources:
|Apache Hadoop||Pseudo distributed container||http://blog.sequenceiq.com/blog/2014/08/18/hadoop-2-5-0-docker/||https://github.com/sequenceiq/hadoop-docker|
|Apache Ambari||Multi node – full Hadoop stack, blueprint based||http://blog.sequenceiq.com/blog/2014/06/19/multinode-hadoop-cluster-on-docker/||https://github.com/sequenceiq/docker-ambari|
|Cloudbreak||Cloud agnostic Hadoop as a Service||http://blog.sequenceiq.com/blog/2014/07/18/announcing-cloudbreak/||https://github.com/sequenceiq/cloudbreak|
|Periscope||SLA policy based autoscaling for Hadoop clusters||http://blog.sequenceiq.com/blog/2014/08/27/announcing-periscope/||https://github.com/sequenceiq/periscope|
In this current post we’d like to help you to start with the
latest - 1.1.0 Spark release in minutes – using Docker. Docker and Spark are two technologies which are very
hyped these days. At SequenceIQ we use both quite a lot, thus we put together a Docker container and sharing it with the community.
The container’s code is available in our GitHub repository.
Pull the image from Docker Repository
We suggest to always pull the container from the official Docker repository – as this is always maintained and supported by us.
Building the image
Alternatively you can always build your own container based on our Dockerfile.
Running the image
Once you have pulled or built the container, you are ready to start with Spark.
1 2 3 4 5 6
There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Estimating Pi (yarn-cluster mode):
1 2 3 4
Estimating Pi (yarn-client mode):
1 2 3 4