In this current post we’d like to help you to start with the
latest - 1.2.0 Spark release in minutes – using Docker. Though we have released and pushed the container between the holidays into the official Docker repository, we were still due with the post. Here are the details …
Docker and Spark are two technologies which are very
hyped these days. At SequenceIQ we use both quite a lot, thus we put together a Docker container and sharing it with the community.
The container’s code is available in our GitHub repository.
Pull the image from Docker Repository
We suggest to always pull the container from the official Docker repository – as this is always maintained and supported by us.
Building the image
Alternatively you can always build your own container based on our Dockerfile.
Running the image
Once you have pulled or built the container, you are ready to start with Spark.
1 2 3 4 5 6
There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Estimating Pi (yarn-cluster mode):
1 2 3 4
Estimating Pi (yarn-client mode):
1 2 3 4