Thursday, October 21, 2004

Getting Purify to work with Java


Let's say you wrote some code using Java JNI and you wanted to Purify that code so that you could find memory leaks and other bugs.
Short answer: you can't.
Here's the long description about what I went through to get to there.
These are the software versions I'm working with:
RedHat Enterprise Linux 3.1
Linux kernel 2.4.21-9EL
PurifyPlus.2003a.06.13.FixPack.0155
Java Runtime 1.4.2_03-b02
One of the most important steps is the .purify file that I had constructed that suppress hundreds of thousands of warnings and allowed me to run things in a reasonable amount of time--but apparently I forgot to save that in a safe place and it's been destroyed. But easy to recreate if you follow these steps.
Anyhow, where I'm stuck is that when an attempt is made by Java to bind to a socket and start listening, it just sits there. There's no activity that I can see via strace, noCPU taken up by the process. But the rtslave is still responsive. It never goes past that step.
I can see this in two different ways; if I turn on Java debugging for my process using the appropriate flags, as soon as the JVM starts up it attempts to bind to that socket. The result is that it just hangs there before executing any Java code. However, if I turn off the Java debugging flag, much Java code is executed up to the point where my Java code attempts to bind to a socket and listen. Then it just sits there again.
In a previous exercise trying to debug Java and listening on a socket, I found that when Java opens a socket it apparently uses rtnetlink to turn off the multicast flag for that socket. I don't know if that has anything to do with it, but it might be interesting...
However, to get this far, here are the steps:
  • You generally have to build a purified executable on the same machine that you're executing on. If anything is different it will crash instantly.
  • The Purify rtslave process just eats tons of memory when it stores errors. If you suppress those errors, it will use much less (or no) memory for those suppressions. The reporting of those errors also takes a huge amount of time, so the purify process ran for a very long time, getting nowhere.
  • The JVM has lots and lots of things that look like MSE's and UMR's. Once you suppress those, the JVM can get somewhere under Purify.
  • You have to set DISPLAY, otherwise Purify will dump everything to stdout, which usually isn't very helpful.
  • I modified our startup environment to pass the environment variables DISPLAY,PUREOPTIONS and PURIFYOPTIONS so that they can affect the operation of Purify.
  • I'm running the JVM with -Xint so that the HotSpot compiler is not invoked, which probably would introduce lots and lots of interesting challenges to get things to work. Update: I got stuck and tried my luck with the HotSpot compiler, and now I'm getting farther. So you should not use -Xint.
  • I found out that IBM has a newer version of Purify that seems to work much better than the previous version against the JVM. It's PurifyPlus.2003a.06.13.FixPack.0155.
    There is an undocumented parameter when building with purify, called -handle-calls-to-java. I added this to my PUREOPTIONS environment variable.
  • Because of -handle-calls-to-java, Purify goes into its cache and sets up symbolic links to "help" the JVM find stuff. For instance, I have -cache-dir set to/var/purify/cache. In /var/purify/cache/opt/scalent/jre/lib/there are lots of symbolic links back to /opt/scalent/jre/lib/. That is where our JRE is stored in the file system.
  • The JVM still needs at least one more (that I know about so far) symbolic link to find stuff. First you have to run the JVM and have it fail with the message: "Error occurred during initialization of VM java.lang.UnsatisfiedLinkError: no zip on java.library.path". This is because when java looks for a library to open called "zip", on Linux it's going to look for libzip.so on its java.library.path. But since the name has been Purify-mangled, it can't find it. Therefore, do the following:
cd /var/purify/cache/opt/scalent/jre/lib/i386/
ln -s /opt/scalent/jre/lib/i386/libawt.so
ln -s /opt/scalent/jre/lib/i386/libcmm.so
ln -s /opt/scalent/jre/lib/i386/libdcpr.so
ln -s /opt/scalent/jre/lib/i386/libdt_socket.so
ln -s /opt/scalent/jre/lib/i386/libfontmanager.so
ln -s /opt/scalent/jre/lib/i386/libhprof.so
ln -s /opt/scalent/jre/lib/i386/libioser12.so
ln -s /opt/scalent/jre/lib/i386/libjaas_unix.so
ln -s /opt/scalent/jre/lib/i386/libjavaplugin_jni.so
ln -s /opt/scalent/jre/lib/i386/libjawt.so
ln -s /opt/scalent/jre/lib/i386/libjcov.so
ln -s /opt/scalent/jre/lib/i386/libJdbc0dc.so
ln -s /opt/scalent/jre/lib/i386/libjdwp.so
ln -s /opt/scalent/jre/lib/i386/libjpeg.so
ln -s /opt/scalent/jre/lib/i386/libsig.so
ln -s /opt/scalent/jre/lib/i386/libjsoundalso.so
ln -s /opt/scalent/jre/lib/i386/libjsound.so
ln -s /opt/scalent/jre/lib/i386/libmlib_image.so
ln -s /opt/scalent/jre/lib/i386/libnative_chmod.so
ln -s /opt/scalent/jre/lib/i386/libnet.so
ln -s /opt/scalent/jre/lib/i386/libnio.so
ln -s /opt/scalent/jre/lib/i386/librmi.so
ln -s /opt/scalent/jre/lib/i386/libverify.so
ln -s /opt/scalent/jre/lib/i386/libzip.so
  • I found another directory that needs to be linked. I got the error "ZoneInfo: /var/purify/cache/opt/scalent/jre/lib/zi/ZoneInfoMappings (No such file or directory)". I also found lots of other directories in a similar state:
cd /var/purify/cache/opt/scalent/jre/lib
ln -s /opt/scalent/jre/lib/zi
ln -s /opt/scalent/jre/lib/locale
ln -s /opt/scalent/jre/lib/images
ln -s /opt/scalent/jre/lib/im
ln -s /opt/scalent/jre/lib/fonts
ln -s /opt/scalent/jre/lib/ext
ln -s /opt/scalent/jre/lib/cmm
ln -s /opt/scalent/jre/lib/audio
  • When the Java code starts up, it forks off processes that are written in C. The result is that Purify follows the fork with another Purify rtslave that immediately does an exec. Purify takes this as a process exit, and so immediately starts looking for leaks in that process. We don't care about leaks at this point; we'll find the leaks in the original JVM process when we want by clicking on the leak button. So until I fix process forking, I'm adding the options -inuse-at-exit=no -leaks-at-exit=no to my PURIFYOPTIONS environment variable.
In case you're wondering, Valgrind won't work either.

No comments:

Post a Comment