Hadoop 2.3 with docker Lajos Papp 19 March 2014

You want to try out hadoop 2.3? Go to the zoo and shave a yak. Or simply just use docker.

1
docker run -i -t sequenceiq/hadoop-docker /etc/bootstrap.sh -bash

Testing

1
2
3
4
5
6
7
8
# start ssh and hdfs
cd $HADOOP_PREFIX

# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar grep input output 'dfs[a-z.]+'

# check the output
bin/hdfs dfs -cat output/*

Yak shaving an elefant

I had problems installing hadoop 2.3 and by googling i stumbled upon this email thread, which references an alternative hadoop docs deployed on github.

By following that description i run into an other issue: hadoop is delivered with 32 bit native libraries. No big deal …

Hadoop native libraries

Of course there is an official Native Libraries Guide it instructs you to simple download the sources and mvn package. But than you face a new issue: missing protobuf. Eeeasy …

Protobuf 2.5

Unfortunately yum install protobuf installs an older 2.3 version, which is close but no cigar. So you download protobuf source, and ./configure && make && make install

To succeed on that one you have to install a couple of development packages, and there you go.

Bintray

I wanted to save you those steps so created a binary distro of the native libs compiled with 64 bit CentOS. So I created Bintray rĖ¨epo. Enjoy

Automate everything

As I’m an automation fetishist, a Docker file was created, and released in the official docker repo

Comments

Recent Posts