At SequenceIQ we use HBase to store large amounts of high velocity data and interact with them – many times we use native HBase interfaces but recently there was a need (internal and external) to access the data through an SQL interface.
HBase is an open-source, distributed, versioned, non-relational database modeled after Google’s Bigtable. It’s designed to handle billions of rows and millions of columns. However, using it as a relational database where you would store your data normalized, split into multiple tables is not easy and most likely you will struggle with it as you would do in any other non-relational database. Here comes Apache Phoenix in the picture. It’s an SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries. The project is in incubating state and under heavy development, but you can already start embracing it.
Download the appropriate distribution from here:
- Phoenix 3.x – HBase 0.94.x
- Phoenix 4.x – HBase 0.98.1+
Note the compatibilities between the HBase and Phoenix versions
Alternatively you can clone the repository and build it yourself (mvn clean install -DskipTests).
It should produce a jar file like this: phoenix-
version-client.jar. Copy it to HBase’s classpath (easiest way is to copy into
HBASE_HOME/lib). If you have multiple nodes it has to be there on every node. Restart the RegionServers and you are good to go. That’s it?
We’ve pre-cooked a Docker image for you so you can follow this sample and play with it (the image is based on Hadoop 2.5, HBase 0.98.5, Phoenix 4.1.0):
docker run -it sequenceiq/phoenix:v4.1onHbase-0.98.5
Alternative launch with sqlline
docker run -it sequenceiq/phoenix:v4.1onHbase-0.98.5 /etc/bootstrap-phoenix.sh -sqlline
The downloaded or built distribution’s bin directory contains a pure-Java console based utility called sqlline.py. You can use this to connect to HBase via the Phoenix JDBC driver. You need to specify the Zookeeper’s QuorumPeer’s address. If the default (2181) port is used then type sqlline.py localhost (to quit type: !quit). Let’s create two different tables:
It’s worth checking which datatypes and
functions are currently supported. These tables will be translated into
HBase tables and the metadata is stored along with it and versioned, such that snapshot queries over prior versions will automatically
use the correct schema. You can check with HBase shell as
1 2 3 4 5 6 7 8
As you can see there are bunch of co-processors. Co-processors were introduced in version 0.92.0 to push arbitrary computation out
to the HBase nodes and run in parallel across all the RegionServers. There are two types of them:
Observers allow the cluster to behave differently during normal client operations. Endpoints allow you to extend the cluster’s
capabilities, exposing new operations to client applications. Phoenix uses them to translate the SQL queries to scans and that’s
why it can operate so quickly. It is also possible to map an existing HBase table to a Phoenix table. In this case the binary
representation of the row key and key values must match one of the Phoenix data types.
After the tables are created fill them with data. For this purpose we’ll use the Jooq library’s fluent API.
The related sample project (Spring based) can be found in our
GitHub repository. To connect you’ll need Phoenix’s
JDBC driver on your classpath (org.apache.phoenix.jdbc.PhoenixDriver). The url to connect to should be familiar as it uses the same Zookeeper QuorumPeer’s address:
jdbc:phoenix:localhost:2181. Unfortunately Jooq’s insert statement is not suitable for us since the JDBC driver only supports the
upsert statement so we cannot make use of the fluent API here.
1 2 3 4 5 6 7 8 9 10 11
On the generated data let’s create queries:
1 2 3 4 5 6 7
This query resulted the following:
1 2 3 4 5 6 7 8 9 10
The same thing could’ve been achieved with sqlline also.
Nested queries are not supported yet, but it will come soon.
As you saw it is pretty easy to get started with Phoenix both command line and programmatically. There are lots of lacking features, but the contributors are dedicated and working hard to make this project moving forward. Next step? ORM for HBase? It is also ongoing.. :)
Follow up with us if you are interested in HBase and building an SQL interface on top. Don’t hesitate to contact us should you have any questions.