Sunday, December 5, 2004

Problems with JVM crashing


I suddenly seem to have all kinds of problems with the JVM crashing when I'm creating it in our monitor code. The way things work is that I have an executable that linksjava.so instead of using the shipped java exectuable. I call this the "driver." Here's what I've found:
The driver will often (but not most of the time) crash, only when -Xdebug is given, with the following stack trace:
gdb build/debug.linux.x86.rhel3/bin/scdriver_debug core.28224
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `/home/jared.oberhaus/jared.oberhaus-linux3-all/shared/1.2/build/debug.linux.x86'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjava.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjava.so
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libverify.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libverify.so
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/native_threads/libhpi.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/native_threads/libhpi.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_ldap.so.2...done.
Loaded symbols for /lib/libnss_ldap.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /usr/lib/sasl/libanonymous.so...done.
Loaded symbols for /usr/lib/sasl/libanonymous.so
Reading symbols from /usr/lib/sasl/libcrammd5.so...done.
Loaded symbols for /usr/lib/sasl/libcrammd5.so
Reading symbols from /usr/lib/sasl/libdigestmd5.so...done.
Loaded symbols for /usr/lib/sasl/libdigestmd5.so
Reading symbols from /usr/kerberos/lib/libdes425.so.3...done.
Loaded symbols for /usr/kerberos/lib/libdes425.so.3
Reading symbols from /usr/kerberos/lib/libkrb5.so.3...done.
Loaded symbols for /usr/kerberos/lib/libkrb5.so.3
Reading symbols from /usr/kerberos/lib/libcom_err.so.3...done.
Loaded symbols for /usr/kerberos/lib/libcom_err.so.3
Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...done.
Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3
Reading symbols from /usr/lib/sasl/libgssapiv2.so...done.
Loaded symbols for /usr/lib/sasl/libgssapiv2.so
Reading symbols from /usr/kerberos/lib/libgssapi_krb5.so.2...done.
Loaded symbols for /usr/kerberos/lib/libgssapi_krb5.so.2
Reading symbols from /usr/lib/sasl/liblogin.so...done.
Loaded symbols for /usr/lib/sasl/liblogin.so
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libpam.so.0...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /usr/lib/sasl/libplain.so...done.
Loaded symbols for /usr/lib/sasl/libplain.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libzip.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libzip.so
Reading symbols from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so...done.
Loaded symbols for /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so
#0  0x0066e6c1 in pthread_mutex_init () from /lib/tls/libpthread.so.0
(gdb) where
#0  0x0066e6c1 in pthread_mutex_init () from /lib/tls/libpthread.so.0
#1  0x01070e3c in ObjectMonitor::ObjectMonitor ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#2  0x01000517 in CreateRawMonitor ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#3  0x0039a872 in JVM_OnLoad ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/libjdwp.so
#4  0x00ff8a2e in JvmdiInternal::post_event ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#5  0x01002a0e in jvmdi::post_vm_initialized_event ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#6  0x010f109c in Threads::create_vm ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#7  0x00fb4388 in JNI_CreateJavaVM ()
   from /home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre/lib/i386/server/libjvm.so
#8  0x08048e6b in exec_java (java_library_path=0x0, 
    jre_home=0xbfffcdda "/home/jared.oberhaus/jared.oberhaus-linux3-all/tools/linux/j2sdk1.4.2_06/jre", 
    java_class=0xbfffce27 "com/scalent/shared/tools/test/MonitorTest3", 
    classpath=0x0) at driver.c:300
#9  0x0804895d in main (argc=5, argv=0xbfffb054) at driver.c:81
  • I thought it was something I did because in the stack trace I can see thatclasspath and java_library_path, parameters to exec_java are null, and sometimes contain other bad values. Examining this with the debugger I've determined that this is just the optimizer. The compiler is passing in the right values for these when they're needed, but otherwise they reflect the value of the register $esi which can vary.
  • I tried to use Purify on this, but there is something seriously broken with Purify on the machine that I'm running on right now. It seems to work better with root, but when I try it as my own user, I get a MSE on almost every malloc and pthreadoperation, whether my code does it or not. Another red/green/blue herring.
  • I tried Valgrind on it to try to find something, but that didn't seem to discover anything either. Of course, Valgrind can't really execute the whole JVM, but that's not what I was looking for; I was just trying to get it to execute my non-JVM code and find some sort of memory corruption.
  • I also tried the j2sdk1.4.2_06, better than our j2sdk1.4.2_03. That didn't help at all. It still crashes at least 1/3 times.
  • Finally, I went into our code and turned off all the options. After 34 runs ofcom.scalent.shared.tools.test.TestMonitor it did not fail once. I believe the whole thing has something to do with the -Xdebug and related options, as I've never seen a crash in the non-debug version of the driver.
  • I think I really proved that it has something devious to do with -Xdebug and friends. I commented out just the -Xdebug and -Xrunjdwp:transport=dt_socket,address=9300,server=y,suspend=noptions and ran the test com.scalent.shared.tools.test.TestMonitor160 times and it didn't fail once.
  • I tried putting a 5 second delay between tests incom.scalent.shared.tools.test.TestMonitor, but that didn't help. It still failed on the third test.
  • I tried again with strict=y on the -Xrunjdwp:transport line, but that didn't help.
    I also tried using the dt_shmem transport for -Xrunjdwp, but that didn't help either.
  • I have resigned myself to the fact that this is a bug in the JVM, at least with the way that I'm calling it. Fortunately it only happens while we have -Xdebug turned on.