Recently we blogged about how you can write simple Apache Spark jobs and how to test them. Now we’d like to introduce all basic RDD operations with easy examples (our goal is to come up with examples as simply as possible). The Spark documentation explains well what each operations is doing in detail. We made tests for most of the RDD operations with good ol’
1 2 3 4 5 6 7 8 9 10
Get the code from our GitHub repository GitHub examples and build the project inside the
spark-samples directory. For running the examples, you do not need any pre-installed Hadoop/Spark clusters or anything else.
1 2 3
All the other RDD operations are covered in the example (makes no sense listing them here).
Spark on YARN
Should you want to run your Spark code on a YARN cluster you have several options.
- Use our Spark Docker container
- Use our multi-node Hadoop cluster
- Use Cloudbreak to provision a YARN cluster on your favorite cloud provider
In order to help you get on going with Spark on YARN read our previous blog post about how to submit a Spark job into a cluster.