As usually, we will work in realm of Surus ORM [1] and Synergy Mapreduce Framework [2], and will start with the definition of a model. For illustration purposes we will use simplified variant of "product" class, that has lowest and highest prices and can only belong to one category. For instance:
ID | category | priceLowest | priceHighest | manufacturer |
Sony eReader PRST2BC | E-READER | 8900 | 12900 | SONY |
Instances will reside in a table product:
To satisfy our search requests, we would like to get a following structure:
ID | products | ||
Sony eReader PRST2BC | Kobo ... | ... | |
E-READER | { priceLowest : 89000, priceHighest: 12900, manufacturer: SONY} |
{ ... } | { ... } |
To create an index as described above, we would need a new model to hold filtration criterias and a mapreduce job to periodically update it.
Secondary index model:
and its corresponding grouping table:
Mapreduce job implies that Job Runner will use product table for source and grouping table for sink. Job's mapper:
and a reducer:
As an alternative to secondary index you can use filtering. For instance SingleColumnValueFilter:
However, SingleColumnValueFilter approach is insufficient for large tables and frequent searches. Stretching it too far will cause performance degradation across the cluster.
To sum it up, secondary indexes are not a trivial, but at the same time - not a paramount of complexity. While designing them, one should look carefully for the filtration criteria and "long-term" perspective.
Hopefully this tutorial would serve you with help.
Cheers!
[1] Surus ORM
https://github.com/mushkevych/surus
[2] Synergy Mapreduce Framework
https://github.com/mushkevych/synergy-framework
No comments:
Post a Comment