Apache Hadoop 2.5.2 on Docker Janos Matyas 24 November 2014

Following the release cycle of Hadoop -2.5.2 point release- today we are releasing a new 2.5.2 version of our Hadoop Docker container.

Centos

Build the image

In case you’d like to try directly from the Dockerfile you can build the image as:

1
docker build  -t sequenceiq/hadoop-docker:2.5.2 .

Pull the image

As it is also released as an official Docker image from Docker’s automated build repository – you can always pull or refer the image when launching containers.

1
docker pull sequenceiq/hadoop-docker:2.5.2

Start a container

In order to use the Docker image you have just build or pulled use:

1
docker run -i -t sequenceiq/hadoop-docker:2.5.2 /etc/bootstrap.sh -bash

Testing

You can run one of the stock examples:

1
2
3
4
5
6
cd $HADOOP_PREFIX
# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar grep input output 'dfs[a-z.]+'

# check the output
bin/hdfs dfs -cat output/*

Hadoop native libraries, build, Bintray, etc

The Hadoop build process is no easy task – requires lots of libraries and their right version, protobuf, etc and takes some time – we have simplified all these, made the build and released a 64b version of Hadoop nativelibs on our Bintray repo. Enjoy.

Should you have any questions let us know through our social channels as LinkedIn, Twitter or Facebook.

Comments

Recent Posts