mushkevych: hbase orm

Showing posts with label hbase orm. Show all posts

Friday, January 11, 2013

HBase: secondary index

As your HBase project moves forward you will likely face a request to search by criteria that is neither included into the primary index nor can be included into it. In other words you will face a problem of fast and efficient search by secondary index. For instance: select all eReaders in a specific price range. In this post, we will review an approach of constructing a secondary index.

As usually, we will work in realm of Surus ORM [1] and Synergy Mapreduce Framework [2], and will start with the definition of a model. For illustration purposes we will use simplified variant of "product" class, that has lowest and highest prices and can only belong to one category. For instance:

ID	category	priceLowest	priceHighest	manufacturer
Sony eReader PRST2BC	E-READER	8900	12900	SONY

Instances will reside in a table product:

To satisfy our search requests, we would like to get a following structure:

ID	products
ID	Sony eReader PRST2BC	Kobo ...	...
E-READER	{ priceLowest : 89000, priceHighest: 12900, manufacturer: SONY}	{ ... }	{ ... }

Here, any search within a specified category would allow us to quickly filter out products in a specific price range or manufacturer.

To create an index as described above, we would need a new model to hold filtration criterias and a mapreduce job to periodically update it.
Secondary index model:

and its corresponding grouping table:

Mapreduce job implies that Job Runner will use product table for source and grouping table for sink. Job's mapper:
and a reducer:
As an alternative to secondary index you can use filtering. For instance SingleColumnValueFilter:
However, SingleColumnValueFilter approach is insufficient for large tables and frequent searches. Stretching it too far will cause performance degradation across the cluster.

To sum it up, secondary indexes are not a trivial, but at the same time - not a paramount of complexity. While designing them, one should look carefully for the filtration criteria and "long-term" perspective.

Hopefully this tutorial would serve you with help.
Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus

[2] Synergy Mapreduce Framework
https://github.com/mushkevych/synergy-framework

Saturday, October 06, 2012

Surus ORM - One Year Later

It has been a little more than a year since Surus - HBase ORM [1] became available via Github. For its birthday party, Surus got:

Support of multi-component rowKeys
This feature is implemented in HPrimaryKey class and @HFieldComponent annotation
Support of List<T> properties, which is implemented as @HListProperty annotation
Integration with HBase Explorer [2]
Code clean-up. Now, Surus ORM fits in less than 20 java files
As a result, all non-related code was moved to synergy-framework project [3]

For the first time, Surus ORM has also a road-map. Currently it contains support of multi-component properties.
Looking back onto the past year, I see it as an interesting endeavour. Surus ORM is still Free Open Source, and you are welcome to fork/use/contribute to it!

Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus/

[2] HBase Explorer + Surus ORM integration
https://github.com/mushkevych/hbase-explorer

[3] Synergy-framework repository
https://github.com/mushkevych/synergy-framework

Thursday, September 27, 2012

HBase Explorer + Surus = Integration

HBase Explorer (HBE) [1] is UI tool to manipulate and explore HBase instances. I have been using it on big data projects for more than a year, and gradually improved integration with Surus to a point, where:

Surus-covered tables are being processed by Surus ORM [3]
HBE supports multiple ORM and multi-component rowKeys via ORMInterface and ORMContext
All tables, not covered by custom ORM are processed by HBE default pattern mechanism

Let's take a look at two screenshots:

Please, note that rowKey components are changing to match the table structure. On the backend, it is supported by two new methods that were added to AbstractPrimaryKey class:

Map<String, Class> getComponents()
ImmutableBytesWritable generateRowKey(Map<String, Object> components);

First is irreplaceable, when it comes to finding out keyRow structure, and second is required to construct actual rowKey from HTML parameters.

Next, let's review what would you need to do to plug-in custom ORM for HBE. It would be two simple steps:

Implement interface ORMInterface
Let's assume class' name will be "AnotherOrm"
Register "AnotherOrm" instance in ORMContext static section:
static {
CONTEXT.add(new OrmSurus());
CONTEXT.add(new AnotherOrm());
}
Build, deploy and use!

In summary: both Surus and HBE got cool features to make your life easier.

Cheers!

[1] HBase Explorer with Surus Integration and multi-ORM support:

https://github.com/mushkevych/hbase-explorer

[2] Original HBase Explorer:

http://sourceforge.net/projects/hbaseexplorer/

[3] Surus ORM:

https://github.com/mushkevych/surus