Under high load situations, it is possible that the Alluxio Worker could experience "OutOfMemoryError". This article will discuss possible root causes, followed by recommended steps to resolve.
Root Cause Analysis
In some situations, the Alluxio Worker will be "lost" from the cluster, and upon inspection of the Worker logs, something similar to the following can be found:
2021-02-09 09:22:20,260 WARN EatWhatYouKill -
java.lang.OutOfMemoryError: Java heap space
2021-02-09 09:21:22,282 ERROR AbstractReadHandler - Failed to run DataReader.
java.lang.OutOfMemoryError: Java heap space
2021-02-09 09:21:15,784 WARN AbstractChannelHandlerContext - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
This error is from the Java virtual machine (JVM), not Alluxio, indicating that the JVM ran out of memory.
A few things need to be considered to fully understand possible root cause:
- Alluxio cache using RAMDISK: by default, and in many production scenarios, memory-mounted file system (RAMDISK) is used for fast read/write caching.
- Worker JVM Heap size: min/max heap and direct memory allocation for the heap must be reviewed.
- Non-Alluxio memory consumption: this includes operating system memory consumption, and any other applications running on the Alluxio Worker host that may be consuming memory.
RAMDISK Sizing Evaluation
The total amount of memory allocated for caching purposes is managed by Alluxio, and is configured in Alluxio's alluxio-site.properties file. Example from default:
# Worker properties
# alluxio.worker.ramdisk.size=1GB
# alluxio.worker.tieredstore.levels=1
# alluxio.worker.tieredstore.level0.alias=MEM
# alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk
The property "alluxio.worker.ramdisk.size" controls the size of the RAMDISK, which is typically mounted on /mnt/ramdisk on the file system.
Worker JVM Heap Size
The Worker JVM heap size (and other tuning) is configured in Alluxio's alluxio-env.sh file. Example of Worker's JVM sizing:
ALLUXIO_WORKER_JAVA_OPTS="-Xmx12g -Xms12g -XX:MaxDirectMemorySize=10g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCMSInitiatingOccupancyOnly -Xloggc:/opt/alluxio/logs/worker_gc.log"
Based on the tuning above, the JVM will consume 12GB of memory.
Non-Alluxio Memory Consumption
It's important to evaluate the other applications/processes on the same host and how much memory they consume; competition for CPU, disk, and of course memory should play into coming up with the appropriate sizing of Alluxio.
Resolution
One or more of the following steps should be taken to relieve the worker from having OutOfMemoryError:
- Reduce size of RAMFS: resize alluxio.worker.ramdisk.size down from the current value to something smaller.
- Reduce Worker Heap: retune JVM parameters to reduce maximum heap size.
- Reduce or relocate non-Alluxio applications: if possible, reduce the memory consumption of other applications on the host; if not possible, consider relocating those services to their own, dedicated hosts.
Comments
0 comments
Please sign in to leave a comment.