Comments on mushkevych: Recovering from OutOfMemoryError in Hadoop mapreduce

Good questions. Let me provide answers: - in cases...

2012-04-19T16:55:28.056-07:00

Good questions. Let me provide answers:
- in cases with user-driven content among hundreds of thousands accounts, you might expect few that simply fall short from your predictions and expectations;
- it is not always money-wise to pay for additional GB of RAM if you need them to compute >0.01% of (usually fake) data
- in this particular case mappers were set to 512MB, while reducers to 768 MB; most of the processing fitted well into 20-30 MB
- in reality try{} block should be build with understanding of JVM RAM Allocation mechanics.
From [A] we can find out that JVM does computation in advance before allocating the object.
Since Hadoop Table mappers and reducers are single threaded, we can expect that memory allocation is located in try() block.
By keeping all references inside try{} block and by trying to allocate large chunk of RAM we secure ourself from running into bad OutOfMemory situations, when Heap is full.

[A] OOME discussion on stackoverflow: http://stackoverflow.com/questions/9261705/what-happens-when-theres-insufficient-memory-to-throw-an-outofmemoryerror

How can you be sure by the time to you get to catc...

2012-03-30T07:27:52.912-07:00

How can you be sure by the time to you get to catch block jvm is in such a state that anything can be done? e.g. run gc manually etc

You're making good point about requirements that are user data driven but still it seems like you're fighting your system resources threshold. Try to figure out peak load and just double the resources to handle it. No always possible but with clouds and dynamic quotas it can be configured for less.

What amount of data on ave are we talking about in your case?

1. It is easier to avoid OoME then to recover. 2. ...

2012-03-28T21:02:23.473-07:00

1. It is easier to avoid OoME then to recover.
2. Sometimes it is cheaper to buy additional 16G then to develop & test recovery code.