Showing posts with label hbase orm. Show all posts
Showing posts with label hbase orm. Show all posts

Friday, January 11, 2013

HBase: secondary index

As your HBase project moves forward you will likely face a request to search by criteria that is neither included into the primary index nor can be included into it. In other words you will face a problem of fast and efficient search by secondary index. For instance: select all eReaders in a specific price range. In this post, we will review an approach of constructing a secondary index.

As usually, we will work in realm of Surus ORM [1] and Synergy Mapreduce Framework [2], and will start with the definition of a model. For illustration purposes we will use simplified variant of "product" class, that has lowest and highest prices and can only belong to one category. For instance:

ID category priceLowest priceHighest manufacturer
Sony eReader PRST2BC E-READER 8900 12900 SONY


Instances will reside in a table product:

To satisfy our search requests, we would like to get a following structure:
ID products
Sony eReader PRST2BC Kobo ... ...
E-READER { priceLowest : 89000,
priceHighest: 12900,
manufacturer: SONY}
{ ... } { ... }
Here, any search within a specified category would allow us to quickly filter out products in a specific price range or manufacturer.

To create an index as described above, we would need a new model to hold filtration criterias and a mapreduce job to periodically update it.
Secondary index model:

and its corresponding grouping table:

Mapreduce job implies that Job Runner will use product table for source and grouping table for sink. Job's mapper:
and a reducer:
As an alternative to secondary index you can use filtering. For instance SingleColumnValueFilter:
However, SingleColumnValueFilter approach is insufficient for large tables and frequent searches. Stretching it too far will cause performance degradation across the cluster.

To sum it up, secondary indexes are not a trivial, but at the same time - not a paramount of complexity. While designing them, one should look carefully for the filtration criteria and "long-term" perspective.

Hopefully this tutorial would serve you with help.
Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus

[2] Synergy Mapreduce Framework
https://github.com/mushkevych/synergy-framework

Saturday, October 06, 2012

Surus ORM - One Year Later

It has been a little more than a year since Surus - HBase ORM [1] became available via Github. For its birthday party, Surus got:
  • Support of multi-component rowKeys
    This feature is implemented in HPrimaryKey class and @HFieldComponent annotation
  • Support of List<T> properties, which is implemented as @HListProperty annotation
  • Integration with HBase Explorer [2] 
  • Code clean-up. Now, Surus ORM fits in less than 20 java files
    As a result, all non-related code was moved to synergy-framework project [3]
For the first time, Surus ORM has also a road-map. Currently it contains support of multi-component properties.
Looking back onto the past year, I see it as an interesting endeavour. Surus ORM is still Free Open Source, and you are welcome to fork/use/contribute to it!

Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus/

[2] HBase Explorer + Surus ORM integration
https://github.com/mushkevych/hbase-explorer

[3] Synergy-framework repository
https://github.com/mushkevych/synergy-framework

Thursday, September 27, 2012

HBase Explorer + Surus = Integration

HBase Explorer (HBE) [1] is UI tool to manipulate and explore HBase instances. I have been using it on big data projects for more than a year, and gradually improved integration with Surus to a point, where:

  • Surus-covered tables are being processed by Surus ORM [3]
  • HBE supports multiple ORM and multi-component rowKeys via ORMInterface and ORMContext
  • All tables, not covered by custom ORM are processed by HBE default pattern mechanism

Let's take a look at two screenshots:


Please, note that rowKey components are changing to match the table structure. On the backend, it is supported by two new methods that were added to AbstractPrimaryKey class:

  • Map<String, Class> getComponents()
  • ImmutableBytesWritable generateRowKey(Map<String, Object> components);
First is irreplaceable, when it comes to finding out keyRow structure, and second is required to construct actual rowKey from HTML parameters. 

Next, let's review what would you need to do to plug-in custom ORM for HBE. It would be two simple steps:
  1. Implement interface ORMInterface
    Let's assume class' name will be "AnotherOrm"
  2. Register "AnotherOrm" instance in ORMContext static section:
        static {
            CONTEXT.add(new OrmSurus());
            CONTEXT.add(new AnotherOrm());
        }
  3. Build, deploy and use!
In summary: both Surus and HBE got cool features to make your life easier.
Cheers!

[1] HBase Explorer with Surus Integration and multi-ORM support:

[2] Original HBase Explorer:

[3] Surus ORM: