mushkevych: surus

Showing posts with label surus. Show all posts

Friday, January 11, 2013

HBase: secondary index

As your HBase project moves forward you will likely face a request to search by criteria that is neither included into the primary index nor can be included into it. In other words you will face a problem of fast and efficient search by secondary index. For instance: select all eReaders in a specific price range. In this post, we will review an approach of constructing a secondary index.

As usually, we will work in realm of Surus ORM [1] and Synergy Mapreduce Framework [2], and will start with the definition of a model. For illustration purposes we will use simplified variant of "product" class, that has lowest and highest prices and can only belong to one category. For instance:

ID	category	priceLowest	priceHighest	manufacturer
Sony eReader PRST2BC	E-READER	8900	12900	SONY

Instances will reside in a table product:

To satisfy our search requests, we would like to get a following structure:

ID	products
ID	Sony eReader PRST2BC	Kobo ...	...
E-READER	{ priceLowest : 89000, priceHighest: 12900, manufacturer: SONY}	{ ... }	{ ... }

Here, any search within a specified category would allow us to quickly filter out products in a specific price range or manufacturer.

To create an index as described above, we would need a new model to hold filtration criterias and a mapreduce job to periodically update it.
Secondary index model:

and its corresponding grouping table:

Mapreduce job implies that Job Runner will use product table for source and grouping table for sink. Job's mapper:
and a reducer:
As an alternative to secondary index you can use filtering. For instance SingleColumnValueFilter:
However, SingleColumnValueFilter approach is insufficient for large tables and frequent searches. Stretching it too far will cause performance degradation across the cluster.

To sum it up, secondary indexes are not a trivial, but at the same time - not a paramount of complexity. While designing them, one should look carefully for the filtration criteria and "long-term" perspective.

Hopefully this tutorial would serve you with help.
Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus

[2] Synergy Mapreduce Framework
https://github.com/mushkevych/synergy-framework

Saturday, October 06, 2012

Surus ORM - One Year Later

It has been a little more than a year since Surus - HBase ORM [1] became available via Github. For its birthday party, Surus got:

Support of multi-component rowKeys
This feature is implemented in HPrimaryKey class and @HFieldComponent annotation
Support of List<T> properties, which is implemented as @HListProperty annotation
Integration with HBase Explorer [2]
Code clean-up. Now, Surus ORM fits in less than 20 java files
As a result, all non-related code was moved to synergy-framework project [3]

For the first time, Surus ORM has also a road-map. Currently it contains support of multi-component properties.
Looking back onto the past year, I see it as an interesting endeavour. Surus ORM is still Free Open Source, and you are welcome to fork/use/contribute to it!

Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus/

[2] HBase Explorer + Surus ORM integration
https://github.com/mushkevych/hbase-explorer

[3] Synergy-framework repository
https://github.com/mushkevych/synergy-framework

Thursday, September 27, 2012

HBase Explorer + Surus = Integration

HBase Explorer (HBE) [1] is UI tool to manipulate and explore HBase instances. I have been using it on big data projects for more than a year, and gradually improved integration with Surus to a point, where:

Surus-covered tables are being processed by Surus ORM [3]
HBE supports multiple ORM and multi-component rowKeys via ORMInterface and ORMContext
All tables, not covered by custom ORM are processed by HBE default pattern mechanism

Let's take a look at two screenshots:

Please, note that rowKey components are changing to match the table structure. On the backend, it is supported by two new methods that were added to AbstractPrimaryKey class:

Map<String, Class> getComponents()
ImmutableBytesWritable generateRowKey(Map<String, Object> components);

First is irreplaceable, when it comes to finding out keyRow structure, and second is required to construct actual rowKey from HTML parameters.

Next, let's review what would you need to do to plug-in custom ORM for HBE. It would be two simple steps:

Implement interface ORMInterface
Let's assume class' name will be "AnotherOrm"
Register "AnotherOrm" instance in ORMContext static section:
static {
CONTEXT.add(new OrmSurus());
CONTEXT.add(new AnotherOrm());
}
Build, deploy and use!

In summary: both Surus and HBE got cool features to make your life easier.

Cheers!

[1] HBase Explorer with Surus Integration and multi-ORM support:

https://github.com/mushkevych/hbase-explorer

[2] Original HBase Explorer:

http://sourceforge.net/projects/hbaseexplorer/

[3] Surus ORM:

https://github.com/mushkevych/surus

Monday, April 02, 2012

Surus: HBase ORM

Surus[1] was mentioned previously, but never really explained.
As soon as you are trying to do anything non-trivial with HBase, you have to deal with transformation from Pojo to HBase and HBase to Pojo. That's where ORM jumps in, and that's where we start with Surus.

Surus is a simple, yet powerful HBase ORM. It features:

Mapping is defined by annotations
(sigh with relieve - no code generation and no setters/getter)
Support both column and column family levels of mapping granularity
Uses JSON for complex data types, and serializes data in a compact binary format

Considering laconic format of the blog posts, let's review typical use-cases for Surus:

Mapping definition
Writing to HBase
Reading from HBase
Integration with Hadoop mapreduce framework

For our example we will need HBase table tbl_example with structure:

Let's assume that we want to:

Store Integer value in column stat:number_of_users
Store Map<String, Integer> in column stat:months
Store Map<Integer, Integer> in column family family_mapping
Store Map<Long, Integer> in every column of the family nested_maps

Our Java class will look like:

To perform HBase insert, we need Put object and some magic from EntityService:

To parse Result object from HBase, we reverse our activities:

And finally, lets review how to integrate with Hadoop mapreduce.

First, mapper:

Next, reducer:

Surus is mature framework, but despite its abilities it is still simple and fast.

Feel free to navigate to Surus wiki [2] for more details and examples.

[1] Surus at github

https://github.com/mushkevych/surus

[2] Surus wiki

https://github.com/mushkevych/surus/wiki