Hadoop and Linux kernel 2.6.27 - epoll limits
Yesterday we faced a strange problem. A newly set up Hadoop cluster got unstable after a few minutes. Logs reported a lot of exceptions like:
java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.
at sun.nio.ch.EPollSelectorImpl.
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
at sun.nio.ch.Util.getTemporarySelector(Util.java:123)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:281)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:102)
at java.lang.Thread.run(Thread.java:619)
or
DataXceiver
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:78)
at java.lang.Thread.run(Thread.java:619)
and others. We double-checked ulimit -n and it reported 32768 on all datanodes as expected. lsof -u hadoop | wc -l was as low as 2000, so “Too many open files”-exceptions seemed strange.
A day and several installation routines later we figured out that the available epoll resources were not sufficient any more. Java JDK 1.6 uses epoll to implement non-blocking-IO. With kernel 2.6.27 resource limits have been introduced and the default on openSuSE is 128 - way too low.
Increasing the limit with echo 1024 > /proc/sys/fs/epoll/max_user_instances fixed the cluster immediately. To make this setting boot safe add the following line to /etc/sysctl.conf:
fs.epoll.max_user_instances = 1024
January 23rd, 2009 at 12:28 am
[…] JDK 1.6 uses epoll to implement NIO […]
February 6th, 2009 at 2:48 am
Thanks for the information. I just ran across the exact same problem and your post saved me hours of useless debugging.
March 16th, 2009 at 9:54 pm
Thank you so much for posting this. I spent a good while searching around to try to fix my “too many open files” error, despite having a plenty high ulimit. My problem was unrelated to hadoop, but this page helped immensely.
March 30th, 2009 at 8:08 pm
[…] http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/ for more […]
April 8th, 2009 at 3:35 am
/proc/sys/fs/epoll/max_user_instances is not in 2.6.27.21-170.2.56.fc10.x86_64 kernel.
Any suggestions on how to handle that. ?
Thanks!!
uname -a
Linux node1 2.6.27.21-170.2.56.fc10.x86_64 #1 SMP Mon Mar 23 23:08:10 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
ls -al /proc/sys/fs/epoll
total 0
-rw-r–r– 1 root root 0 2009-04-07 02:01 max_user_watches
April 3rd, 2010 at 2:22 am
[…] pero on anything » Blog Archive » Hadoop and Linux kernel 2.6.27 - epoll limits pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits – view page – cached Yesterday we faced a strange problem. A newly set up Hadoop cluster got unstable after a few minutes. Logs reported a lot of exceptions like: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:68) at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:52) Filter tweets […]