Friday, July 27, 2012

Cloudera Hadoop Certifications

 And so it happened! After 4 weeks of studying and 2 exams I am officially Cloudera Certified Developer for Apache Hadoop and Cloudera Certified Specialist in Apache HBase

Policies that each candidate signs before the test begins prohibit disclosure of questions and answers, so don't look for them here :)
However, I will gladly share my training materials, as well as focus on most problematic topics.

I have been studying from books [1] and [2], and read them from cover to cover despite considering many chapters as "well known". I strongly advice everybody seeking the certification to go thru each chapter. It will save you training expenses $1800 + $3000 + $Travel [3] and as any reading - provide solid knowledge base. You can also try to reduce certification costs [4] by asking Cloudera to grand you discount coupons.

Topics that I found intriguing:
- Write path and coherency model for HDFS
While reading the chapters, try to ask yourself a question: what will happen to files, shall the client die in the middle of the copy process; how HBase handles real-time duplication of WAL;
- InputFormat
Good news is that there are not many of them, so it should not take long to put sample format for each of them. As stated in [5], you might be asked to identify proper InputFormat given the sample.
- HBase Region Lookups mechanism
Be sure to understand how client finds -ROOT-, .META. and data. When is it querying ZooKeeper and what are fail-over algorithms. 
- HFile format and performance dependency on block size

In summary: exams are not too scary, but give yourself 2 weeks per exam to prepare properly.
Cheers!

[1] Hadoop: The Definitive Guide, 2nd Edition

[2] HBase: The Definitive Guide


[3] Cloudera training schedule

[4] Direct Cloudera certification 

[5] List of probable exam topics
http://university.cloudera.com/certification/CCDH.html