Bruning Questions: Debugging Node Apps with DTrace

Last week, I wrote about debugging applications using DTrace (see Debugging with DTrace). That blog post concentrates on debugging applications written in C. I'm currently preparing to teach a session on using DTrace with Node.js for NodeConf 2013, and thought I might write a bit about debugging Node.js applications using DTrace.

For this post, we'll look at a simple Node app that makes a copy of a file in a separate directory. The program removes the destination directory and file(s), and as it makes the copy, it computes a checksum. And, since this is a post about debugging, it has a bug. TJ Fontaine of Joyent provided some help. Thanks, TJ!

Here is the output:

$ node file.jstmp/words.save md5sum = dee3071300ce352e12a19c4912285f35$ md5sum wordsdee3071300ce352e12a19c4912285f35  words$ ls -l words tmp/words.save-rw-r--r-- 1 student other 22613134 Jun  9 06:34 tmp/words.save-rw-r--r-- 1 student other 22940814 Jun  9 04:37 words$

So, the app writes out the location of the copied file, and an md5 checksum. The checksums match, but the copy is shorter than the original file.

This problem originally occurred in a much larger application. The file.js can be found at github.com/max123. For those reasonably adept at node programming, the bug may be obvious. We'll use DTrace to try to determine what is happening. To do this, first we'll take a look at system calls that are made by the node engine.

$ sudo /usr/sbin/dtrace -x temporal=1 -qn 'syscall:::entry/pid == $target/{printf("%s\n", probefunc);}' -c "node file.js"tmp/words.save md5sum = dee3071300ce352e12a19c4912285f35mmapsetcontextgetrlimitgetpidsetcontextsysconfigsysconfigsigpendingsysi86open64...$

This yields a lot of output.

$ sudo /usr/sbin/dtrace -x temporal=1 -qn 'syscall:::entry/pid == $target/{printf("%s\n", probefunc);}' -c "node file.js" | wc   6601    6603   45059$

So, 6600 system calls (the first line is from the app itself). A common problem people have with DTrace is that, like truss(1), you can get a lot of output. So a common problem is that you have possibly millions of probes you can enable. Which ones will give you what you need to know? If you think a little about the application, you'll find that there are only some system calls that are "interesting" in terms of the bug. The source and destination files are being read/written, and the directory containing the destination is being removed (see the source, or DTrace it). To DTrace only read(2), write(2), and unlink(2) calls, we'll write a short D script. How did I know that unlink(2) is used to remove files? I could use DTrace or truss(1) on the rm(1) command, or I could look at the source for rm(1).

Here's the script:

#!/usr/sbin/dtrace -qs#pragma D option temporalsyscall::read:entry/pid == $target && fds[arg0].fi_pathname == "/home/student/nodeconf/words"/{    printf("read request %d bytes\n", arg2);}syscall::write:entry/pid == $target && fds[arg0].fi_pathname == "/home/student/nodeconf/tmp/words.save"/{    printf("write request %d bytes\n", arg2);}syscall::unlink*:entry/pid == $target/{    printf("unlinking %s\n", copyinstr(arg0));}

The D script only looks at reads/writes to the source and destination files. Let's run it.

$ sudo ./rwu.d -c "node file.js"Password: tmp/words.save md5sum = dee3071300ce352e12a19c4912285f35unlinking tmpread request 65536 bytesread request 65536 bytesread request 65536 bytesunlinking tmp/words.saveread request 65536 bytesread request 65536 bytesread request 65536 bytesread request 65536 byteswrite request 65536 bytesread request 65536 byteswrite request 65536 bytes...

So, the /tmp directory and the tmp/words.save file are removed after some reads are made from the source file. Let's add some DTrace probes to the application. At the beginning of the application, we'll define a new DTrace provider, and we'll add some probes to it.

var d = require('dtrace-provider');var dtp = d.createDTraceProvider('fileapp');dtp.addProbe('rimraf', 'char *');dtp.addProbe('srcstream', 'char *');dtp.addProbe('chksum', 'char *', 'int');

And, we'll cause these probes to fire at specific instrumented places in the code. These are basically USDT (user statically defined traces) probes.

function storeFileFromFile(srcstream, dstfile, callback) {    /* remove the destination path and recreate it */        dtp.fire('rimraf', function(p) {        return ['calling rimraf', 'now'];        });        rimraf("tmp", function (err2) {    ...        dtp.fire('srcstream', function(p) {            return [ 'srcstream.pipe(toStream);' ]        });        srcstream.pipe(toStream);        ...    var i = 0;    stream.on('data', function (chunk) {    dtp.fire('chksum', function(p) {        return ['md5sum.update(chunk)', i];    });    md5sum.update(chunk);    i++;    });...

And finally, we'll allow the probes to be enabled before calling the method to copy the files.

dtp.enable();copyFile("words", "words.save");

And here is a D script to enable the probes.

#!/usr/sbin/dtrace -sfileapp$target:::rimraf{    printf("%s\n", copyinstr(arg0));}fileapp$target:::srcstream{    printf("%s\n", copyinstr(arg0));}fileapp$target:::chksum{    printf("%s\n", copyinstr(arg0));    printf("%d\n", arg1);}

Running the script:

$ sudo ./file_probes.d -q -Z -c "node file_probes.js"Password: tmp/words.save md5sum = dee3071300ce352e12a19c4912285f35calling rimrafmd5sum.update(chunk)0md5sum.update(chunk)1md5sum.update(chunk)2md5sum.update(chunk)3srcstream.pipe(toStream);md5sum.update(chunk)4md5sum.update(chunk)5md5sum.update(chunk)6md5sum.update(chunk)7...

Again we see that the md5 checksum is getting done before the source is piped to the destination. Reading the documentation for stream, we need to use the pause() and resume() method (or possibly some other function besides pipe()) to make sure we don't use data.

Inserting stream.pause() and stream.resume() at the appropriate places in the code (see file_probes.good.js) fixes the problem.

$ sudo ./file_probes.d -q -Z -c "node file_probes.good.js"tmp/words.save md5sum = dee3071300ce352e12a19c4912285f35calling rimrafsrcstream.pipe(toStream);md5sum.update(chunk)0md5sum.update(chunk)1md5sum.update(chunk)2md5sum.update(chunk)3md5sum.update(chunk)4md5sum.update(chunk)5...

Now the srcstream.pipe(toStream); is being done before the md5sum. Also, the checksum is still correct, and the destination file is the correct size.

$ ls -l words tmp/words.save-rw-r--r-- 1 root    root  22940814 Jun  9 08:17 tmp/words.save-rw-r--r-- 1 student other 22940814 Jun  9 04:37 words$

Note that debuggers are tools to help you debug. They don't actually fix the problem. If a debugger tool actually did the debugging, a lot of us would be out of work!



Post written by Mr. Max Bruning