Showing posts with label surus. Show all posts
Showing posts with label surus. Show all posts

Friday, January 11, 2013

HBase: secondary index

As your HBase project moves forward you will likely face a request to search by criteria that is neither included into the primary index nor can be included into it. In other words you will face a problem of fast and efficient search by secondary index. For instance: select all eReaders in a specific price range. In this post, we will review an approach of constructing a secondary index.

As usually, we will work in realm of Surus ORM [1] and Synergy Mapreduce Framework [2], and will start with the definition of a model. For illustration purposes we will use simplified variant of "product" class, that has lowest and highest prices and can only belong to one category. For instance:

ID category priceLowest priceHighest manufacturer
Sony eReader PRST2BC E-READER 8900 12900 SONY


Instances will reside in a table product:

To satisfy our search requests, we would like to get a following structure:
ID products
Sony eReader PRST2BC Kobo ... ...
E-READER { priceLowest : 89000,
priceHighest: 12900,
manufacturer: SONY}
{ ... } { ... }
Here, any search within a specified category would allow us to quickly filter out products in a specific price range or manufacturer.

To create an index as described above, we would need a new model to hold filtration criterias and a mapreduce job to periodically update it.
Secondary index model:

and its corresponding grouping table:

Mapreduce job implies that Job Runner will use product table for source and grouping table for sink. Job's mapper:
and a reducer:
As an alternative to secondary index you can use filtering. For instance SingleColumnValueFilter:
However, SingleColumnValueFilter approach is insufficient for large tables and frequent searches. Stretching it too far will cause performance degradation across the cluster.

To sum it up, secondary indexes are not a trivial, but at the same time - not a paramount of complexity. While designing them, one should look carefully for the filtration criteria and "long-term" perspective.

Hopefully this tutorial would serve you with help.
Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus

[2] Synergy Mapreduce Framework
https://github.com/mushkevych/synergy-framework

Saturday, October 06, 2012

Surus ORM - One Year Later

It has been a little more than a year since Surus - HBase ORM [1] became available via Github. For its birthday party, Surus got:
  • Support of multi-component rowKeys
    This feature is implemented in HPrimaryKey class and @HFieldComponent annotation
  • Support of List<T> properties, which is implemented as @HListProperty annotation
  • Integration with HBase Explorer [2] 
  • Code clean-up. Now, Surus ORM fits in less than 20 java files
    As a result, all non-related code was moved to synergy-framework project [3]
For the first time, Surus ORM has also a road-map. Currently it contains support of multi-component properties.
Looking back onto the past year, I see it as an interesting endeavour. Surus ORM is still Free Open Source, and you are welcome to fork/use/contribute to it!

Cheers!

[1] Surus ORM
https://github.com/mushkevych/surus/

[2] HBase Explorer + Surus ORM integration
https://github.com/mushkevych/hbase-explorer

[3] Synergy-framework repository
https://github.com/mushkevych/synergy-framework

Thursday, September 27, 2012

HBase Explorer + Surus = Integration

HBase Explorer (HBE) [1] is UI tool to manipulate and explore HBase instances. I have been using it on big data projects for more than a year, and gradually improved integration with Surus to a point, where:

  • Surus-covered tables are being processed by Surus ORM [3]
  • HBE supports multiple ORM and multi-component rowKeys via ORMInterface and ORMContext
  • All tables, not covered by custom ORM are processed by HBE default pattern mechanism

Let's take a look at two screenshots:


Please, note that rowKey components are changing to match the table structure. On the backend, it is supported by two new methods that were added to AbstractPrimaryKey class:

  • Map<String, Class> getComponents()
  • ImmutableBytesWritable generateRowKey(Map<String, Object> components);
First is irreplaceable, when it comes to finding out keyRow structure, and second is required to construct actual rowKey from HTML parameters. 

Next, let's review what would you need to do to plug-in custom ORM for HBE. It would be two simple steps:
  1. Implement interface ORMInterface
    Let's assume class' name will be "AnotherOrm"
  2. Register "AnotherOrm" instance in ORMContext static section:
        static {
            CONTEXT.add(new OrmSurus());
            CONTEXT.add(new AnotherOrm());
        }
  3. Build, deploy and use!
In summary: both Surus and HBE got cool features to make your life easier.
Cheers!

[1] HBase Explorer with Surus Integration and multi-ORM support:

[2] Original HBase Explorer:

[3] Surus ORM:

Monday, April 02, 2012

Surus: HBase ORM

Surus[1] was mentioned previously, but never really explained.
As soon as you are trying to do anything non-trivial with HBase, you have to deal with transformation from Pojo to HBase and HBase to Pojo. That's where ORM jumps in, and that's where we start with Surus.

Surus is a simple, yet powerful HBase ORM. It features:
  • Mapping is defined by annotations
    (sigh with relieve - no code generation and no setters/getter)
  • Support both column and column family levels of mapping granularity 
  • Uses JSON for complex data types, and serializes data in a compact binary format

Considering laconic format of the blog posts, let's review typical use-cases for Surus:
  • Mapping definition
  • Writing to HBase 
  • Reading from HBase 
  • Integration with Hadoop mapreduce framework
For our example we will need HBase table tbl_example with structure:
Let's assume that we want to:
  • Store Integer value in column stat:number_of_users
  • Store Map<String, Integer> in column stat:months
  • Store Map<Integer, Integer> in column family family_mapping
  • Store Map<Long, Integer> in every column of the family nested_maps
Our Java class will look like:

To perform HBase insert, we need Put object and some magic from EntityService:

To parse Result object from HBase, we reverse our activities:

And finally, lets review how to integrate with Hadoop mapreduce.
First, mapper:
Next, reducer:
Surus is mature framework, but despite its abilities it is still simple and fast.
Feel free to navigate to Surus wiki [2] for more details and examples.

[1] Surus at github

[2] Surus wiki