Monday, April 02, 2012

Surus: HBase ORM

Surus[1] was mentioned previously, but never really explained.
As soon as you are trying to do anything non-trivial with HBase, you have to deal with transformation from Pojo to HBase and HBase to Pojo. That's where ORM jumps in, and that's where we start with Surus.

Surus is a simple, yet powerful HBase ORM. It features:
  • Mapping is defined by annotations
    (sigh with relieve - no code generation and no setters/getter)
  • Support both column and column family levels of mapping granularity 
  • Uses JSON for complex data types, and serializes data in a compact binary format

Considering laconic format of the blog posts, let's review typical use-cases for Surus:
  • Mapping definition
  • Writing to HBase 
  • Reading from HBase 
  • Integration with Hadoop mapreduce framework
For our example we will need HBase table tbl_example with structure:
<TableSchema name="tbl_example">
<ColumnSchema name="family_mapping" BLOCKCACHE="false" VERSIONS="1"/>
<ColumnSchema name="stat" BLOCKCACHE="false" VERSIONS="1"/>
<ColumnSchema name="nested_maps" BLOCKCACHE="false" VERSIONS="1"/>
</TableSchema>
view raw tbl_example.xml hosted with ❤ by GitHub
Let's assume that we want to:
  • Store Integer value in column stat:number_of_users
  • Store Map<String, Integer> in column stat:months
  • Store Map<Integer, Integer> in column family family_mapping
  • Store Map<Long, Integer> in every column of the family nested_maps
Our Java class will look like:
public class Example {
@HRowKey
public byte[] key;
@HProperty(family = "stat", identifier = "number_of_users")
public long numberOfUsers;
@HMapProperty(family = "stat", identifier = "months", keyType = String.class, valueType = Integer.class)
public Map<String, Integer> months = new HashMap<String, Integer>();
@HMapFamily(family = "family_mapping", keyType = Integer.class, valueType = Integer.class)
public Map<Integer, Integer> familyMapping = new HashMap<Integer, Integer>();
@HMapFamily(family = "nested_maps", keyType = Long.class, valueType = Map.class)
@HNestedMap(keyType = Long.class, valueType = Integer.class)
public Map<Long, Map<Long, Integer>> nestedMaps = new HashMap<Long, Map<Long, Integer>>();
}
view raw Example.java hosted with ❤ by GitHub

To perform HBase insert, we need Put object and some magic from EntityService:
// declare and initialize Example instance
Example example = new Example();
example.numberOfUsers=...
// declare EntityService
EntityService<Example> esExample = new EntityService<Example>(Example.class);
// get Put object
Put put = esExample.insert(example);
view raw getPut.java hosted with ❤ by GitHub

To parse Result object from HBase, we reverse our activities:
// create Get object
HTable tExample = ...
Get get = new Get(ID_IN_BYTES);
Result result = tExample.get(get);
Example example = esExample.parseResult(result);

And finally, lets review how to integrate with Hadoop mapreduce.
First, mapper:
@Override
protected void map(ImmutableBytesWritable key, Result result, Context context) throws IOException, InterruptedException {
int vertexA = Bytes.toInt(key.get());
Example example = esExample.parseResult(result);
...
}
view raw mapper.java hosted with ❤ by GitHub
Next, reducer:
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<ImmutableBytesWritable> values, Context context) throws IOException, InterruptedException {
int vertexA = Bytes.toInt(key.get());
for (ImmutableBytesWritable entry : values) {
Example example = create Example instance of "entry"
Put putA = esExample.insert(example);
putA.setWriteToWAL(false);
context.write(HBASE_KEY_DUMMY, putA);
}
}
view raw reducer.java hosted with ❤ by GitHub
Surus is mature framework, but despite its abilities it is still simple and fast.
Feel free to navigate to Surus wiki [2] for more details and examples.

[1] Surus at github

[2] Surus wiki

1 comment:

Unknown said...

Hello, please setup public permissions on your wiki in github, we are unable to read most of the pages.