Using DTrace to Demystify Watchpoints in the Oracle Developer Studio dbx Debugger

By Nasser Nouri, May 2008, (updated April 2011, June 2016)

One of the most useful debugging features in the Oracle Developer Studio dbx debugger is enabling watchpoints during the execution of programs. A watchpoint, which is also called a data change breakpoint, can be used in dbx to stop a program when the value of a variable or expression has changed. A watchpoint is similar to a breakpoint, except that a watchpoint stops execution when an address location is read or modified, whereas a breakpoint stops execution when an instruction is executed at a specified location.

This article intends to educate users on how to use the watchpoint facility in the Oracle Developer Studio dbx debugger. The dbx debugger can be used for both source-level and instruction-level debugging.

Additionally, the Oracle Solaris Dynamic Tracing (DTrace) facility is used to show how the internal states of the Oracle Solaris kernel can be traced with a simple D script.

Under the Hood

Historically, watchpoints were implemented in software, and to some extent they slowed down program execution. The newer versions of microprocessors are equipped with debug registers that enable modern software debuggers such as dbx to create hardware watchpoints. The hardware watchpoints are extremely fast and do not slow down the execution of programs.

For example, Intel and AMD architectures have eight debug registers, DR0 through DR7. The DR0 through DR3 registers can be used for creating address breakpoints. Software can load a virtual (linear) address into any of the four registers, and enable breakpoints to occur when the address matches an instruction or data reference. The debug control register (DR7) is used to establish the breakpoint conditions for the address breakpoint registers (DR0 through DR3) and to enable debug exceptions for each address breakpoint register individually. DR6 is the debug status register. The microprocessor loads the debug status into DR6 when an enabled debug condition is encountered that causes a debug exception. This register is never cleared by the processor and must be cleared by software after the contents have been read. The DR4 and DR5 registers are reserved and cannot be used by software.

Fortunately, Oracle Solaris provides a well-defined interface called /proc that shields developers from the complexity of different microprocessor architectures. Using the /proc interface makes applications such as the dbx debugger extremely portable across Oracle Solaris platforms. The Oracle Solaris OS runs on SPARC, x86, and x64 architectures.

The /proc interface is a file system that provides access to the state of each process and lightweight process (LWP) in the system. Watchpoints are set and cleared through the /proc file system interface, by opening the control file for a process and then sending a PCWATCH command (see the proc(4) man page for more details).

The PCWATCH command is accompanied by a prwatch data structure, which contains the address, the length of the area to be affected, and the type of access to be watched for: read, write, execute, and stop before or after the access.

A watchpoint is triggered when an LWP in the traced process makes a memory reference that covers at least one byte of a watched area and the memory reference matches the access mode specified by the PCWATCH command. When an LWP triggers a watchpoint, it incurs a watchpoint trap (FLTWATCH), which is generated by the Oracle Solaris kernel. If FLTWATCH is being traced, the LWP stops; otherwise, it is sent a SIGTRAP signal. If SIGTRAP is being traced and is not blocked, then the LWP stops. At this point the dbx debugger takes control and you can issue other dbx commands to examine the states of the traced process.

Setting Watchpoints in the dbx Debugger

To stop execution when a memory address has been accessed, type:

stop access mode address-expression [, byte-size-expression ]

mode specifies how the memory was accessed. It can be composed of one or all of the following letters:

r The memory at the specified address has been read
w The memory has been written to
x The memory has been executed

mode can also contain either of the following:

a Stops the process after the access (default)
b Stops the process before the access

address-expression is any expression that can be evaluated to produce an address. If you give a symbolic expression, the size of the region to be watched is automatically deduced; you can override it by specifying byte-size-expression. You can also use nonsymbolic, typeless address expressions, in which case the size is mandatory.

If you typed the following command, execution would stop after the memory address 0xfffffd7fffdff7a had been read:

(dbx) stop access r 0xfffffd7fffdff7a8, 4

If you typed the following command, execution would stop before the variable local had been written to:

(dbx) stop access wb &local

Keep these points in mind when using the stop access command:

  • The event occurs when a variable is written to, even if it has the same value.
  • By default, the event occurs after execution of the instruction that wrote to the variable. You can indicate that you want the event to occur before the instruction is executed by specifying the mode as b.

The older stop modify command is still accepted for backward compatibility and maps to the appropriate stop access command:

stop modify address-expression [, byte-size-expression ]

In the following a.cc example, we would like to stop the process whenever the local variable is accessed for a write operation.

The a.cc Test Case


#include <stdio.h>

int global = 0;
     static int stat = 0;

     void poker(int *ip)
     {
         *ip = 5;
     }

     main()
     {
         static int flocal;
         int local;

         global = 0;
         stat = 0;
         flocal = 0;
         local = 0;
         poker(&global);
         poker(&stat);
         poker(&flocal);
         poker(&local);
      }

The a.cc test case is compiled as follows:

CC -g -m64 a.cc

By default, the C++ compiler generates the a.out executable.

Now let's run the dbx debugger on the a.out executable and set a data change breakpoint (watchpoint) for the local variable.


 % dbx a.out

     For information about new features see `help changes'
     To remove this message, put `dbxenv suppress_startup_message 7.6'in your .dbxrc
     Reading a.out
     Reading ld.so.1
     Reading libCstd.so.1
     Reading libCrun.so.1
     Reading libm.so.2
     Reading libc.so.1
     (dbx) stop in main
     (2) stop in main
     Running: a.out 
     (process id 10452)
     stopped in main at line 16 in file "a.cc"
        16       global = 0;
     (dbx) stop access w&local    
     (3) stop access wa &local, 4
     (dbx) cont
     watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 19 in file "a.cc"
        19       local = 0;
     (dbx) cont
     watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
         8       *ip = 5;
     (dbx)

As shown, the stop access command with write access mode is used to set a watchpoint for the local variable. The &local syntax stands for the address of the local variable. The local variable is defined as a four-byte integer, hence the size of the region to be watched is automatically deduced and appended to the command syntax.

The watchpoint trap is triggered twice for the local variable. The first time is when the local variable is assigned a value of zero in the main function. The second time is when the local variable is assigned a value of five in the poker method.

Monitoring Watchpoint Traps With DTrace

This section describes how the watchpoint trap can be traced in the Oracle Solaris kernel using the Oracle Solaris Dynamic Tracing (DTrace) facility. It is assumed that you are already familiar with D script syntax, probes, and constructs. Otherwise, the following article is recommended for reading before you continue with this section: Using DTrace with Oracle Developer Studio Tools to Understand, Analyze, Debug, and Enhance Complex Applications.

The /usr/include/sys/fault.h header file contains the names of all hardware faults that can be traced in the process. However, for this particular subject, we need to pay attention only to the FLTWATCH fault. The FLTWATCH fault or the number 12 is the watchpoint trap. The following D script shows how to use the fault probe in the proc provider to monitor the hardware faults.

The fault.d D Script


     #pragma D option quiet
     
     dtrace:::BEGIN {
          printf("Tracing hardware faults. Enter <control-c> to end.\n");
     }

     proc:::fault

     {

         @[execname, args[0], args[1]->__data.__fault.__addr,
                     args[1]->__data.__fault.__pc] = count();
     }

     END
     {
         printf("%10s %10s %16s %16s %10s\n", 
                "EXECUTABLE", "FAULT", "ADDRESS", "PC", "COUNT");
         printa("%10s %10d %16p %16p %8@d\n", @);
     }

The fault probe fires when a thread experiences a machine fault. The fault probe has two arguments: The fault code is in args[0]. The kernel siginfo structure corresponding to the fault is pointed to by args[1].

The kernel siginfo_t structure is defined in the /usr/include/sys/siginfo.h header file. The siginf_t structure consists of a union of several structures. However, for this particular example, we are only interested in tracing the __addr and __pc fields of __fault structure.

As shown in the fault.d D script, the aggregation is used in the proc::fault clause to collect data based on the following expressions:

execname     The executable name
args[0]     The fault code
args[1]->__data.__fault.__addr   args[1] is a pointer to the siginfo_t structure.     __addr is the address of a watched area in memory.
args[1]->__data.__fault.__pc   args[1] is a pointer to siginfo_t structure.     __pc is the address of the instruction that accesses the     watched area in memory

The count()() function shows the number of times each fault is triggered in a process. The fault.d script needs to be run in a separate terminal window. The following dtrace command enables the fault probe in the Oracle Solaris kernel:

dtrace -s fault.d

At this point, the fault probe in the proc provider is enabled and waiting to collect data.

Now, in a separate terminal window, let's run dbx on the a.out executable and enter the same sequence of commands shown in the previous section to set (and trigger) a data change breakpoint for the local variable.

As it is instructed in the terminal window from which the dtrace command is invoked, the <control-c> command ends the execution of the fault.d script. DTrace generates the following output:


 % dtrace -s fault.d
     Tracing hardware faults. Enter <control-c> to end.
     ^C
     EXECUTABLE      FAULT          ADDRESS              PC     COUNT
          a.out          3           401048               0       1
          a.out          3 fffffd7fff3ce570               0       1
          a.out          4           401053               0       1
          a.out          4 fffffd7fff3ce571               0       1
          a.out         12 fffffd7fffdff7c8          401030       1
          a.out         12 fffffd7fffdff7c8          401069       1
          a.out          3 fffffd7fff3ce540               0       2

The FAULT column lists all hardware faults that are traced in the a.out process. The FLTBPT fault or the number 3 is the breakpoint trap. The FLTTRACE fault or the number 4 is the trace trap (single-step). However, as mentioned before, we only need to pay attention to FLTWATCH fault or the number 12.

Based on the output of the fault.d script, the a.out process incurred the watchpoint trap twice for the 0xfffffd7fffdff7c8 address. As you may have already guessed, 0xfffffd7fffdff7c8 is the address of the local variable in memory (see the output of dbx in the previous section).

Two instruction addresses, 0x401030 and 0x401069, are listed in the PC (Program Counter) column. These two instructions contain a memory reference to the watched area (0xfffffd7fffdff7c8). Hence, the watchpoint trap is triggered for these instructions.

The next step is to figure out what these two instructions are. You can use dbx to disassemble the code and inspect the assembly code for 0x401030 and 0x401069 instruction addresses.

It is assumed that you are already familiar with dbx instruction-level debugging commands. Otherwise, the following article is recommended for reading before proceeding with rest of this section: AMD64 Instruction-Level Debugging With dbx.

Below is the output of dbx. The dis command is used to disassemble the portion of code that correspond to the 0x401030 and 0x401069 instruction addresses. The regs command is used to display the contents of the general purpose registers.


 (dbx) cont

 watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
         8       *ip = 5;
 (dbx) dis main
 0x0000000000401040: main                pushq     %rbp
 0x0000000000401041: main+0x0001:        movq      %rsp,%rbp
 0x0000000000401044: main+0x0004:        subq      $0x0000000000000010,%rsp
 0x0000000000401048: main+0x0008:        movl      $0x0000000000000000,global
 0x0000000000401053: main+0x0013:        movl      $0x0000000000000000,stat
 0x000000000040105e: main+0x001e:        movl      $0x0000000000000000,__1fEmain1AGflocal_
 0x0000000000401069: main+0x0029:        movl      
 $0x0000000000000000,0xfffffffffffffff8(%rbp)
 0x0000000000401070: main+0x0030:        movq      $global,%rdi
 0x0000000000401077: main+0x0037:        movl      $0x0000000000000000,%eax
 0x000000000040107c: main+0x003c:        call      poker [ 0x401020, .-0x5c ]
 (dbx) dis poker
 0x0000000000401020: poker       :       pushq     %rbp
 0x0000000000401021: poker+0x0001:       movq&     %rsp,%rbp
 0x0000000000401024: poker+0x0004:       subq      $0x0000000000000010,%rsp
 0x0000000000401028: poker+0x0008:       movq      %rdi,0xfffffffffffffff8(%rbp)
 0x000000000040102c: poker+0x000c:       movq      0xfffffffffffffff8(%rbp),%r8
 0x0000000000401030: poker+0x0010:       movl
 $0x0000000000000005,0x0000000000000000(%r8)
 0x0000000000401038: poker+0x0018:       leave
 0x0000000000401039: poker+0x0019:       ret 
 0x000000000040103a: poker+0x001a:       nop 
 0x000000000040103c: _ex_deregister+0x01f4:         nop
     (dbx) regs
     current frame:   [1]
     r15     0x0000000000000000
     r14     0x0000000000000000
     r13     0x0000000000000000
     r12     0x0000000000000000
     r11     0xfffffffffbc01ec8
     r10     0x0000000048fe9d0a
     r9      0x00000000000015da
     r8      0xfffffd7fffdff7c8
     rdi     0xfffffd7fffdff7c8
     rsi     0xfffffd7fffdff7f8
     rbp     0xfffffd7fffdff7b0
     rbx     0xfffffd7fff3fac40
     rdx     0xfffffd7fffdff808
     rcx     0x0000000000093182 
     rax     0x0000000000000000
     trapno  0x0000000000000001
     err     0x0000000000000000
     rip     0x0000000000401030:poker+0x10    movl
     $0x0000000000000005,0x0000000000000000(%r8)
     cs      0x0000000000000053
     eflags  0x0000000000000286
     rsp     0xfffffd7fffdff7a0 
     ss      0x000000000000004b
     fs      0x0000000000000000 
     gs      0x0000000000000000 
     es      0x000000000000004b 
     ds      0x000000000000004b 
     fsbase  0xfffffd7fff382000 
     gsbase  0x0000000000000000
     (dbx) 

As shown above, the watchpoint is triggered when the number 5 is assigned to the *ip formal parameter inside of the poker method at line 8 of the a.cc program.

Similarly, the same assignment operation can be observed at the assembly level. The movl instruction at the 0x401030 address dereferences the content of the %r8 register and assigns 5 to the variable whose address is 0xfffffd7fffdff7c8 (the local variable).

In Conclusion

The hardware-assisted watchpoints in dbx are fast and very useful for debugging extremely difficult software defects. A watchpoint, also known as data change breakpoint, can be used in dbx to stop a program when the value of a variable or expression has changed.

The DTrace facility enables you to monitor the internal states of the Oracle Solaris kernel in ways you could not have done it before. A simple D script, as shown in this article, can reveal how the Oracle Solaris kernel interacts with applications during execution.

Finally, using dbx and DTrace simultaneously creates the ultimate debugging environment to unravel the most obscure software defects in your applications and even the Oracle Solaris kernel itself.