By Nasser Nouri, May 2008, (updated April 2011, June 2016)
One of the most useful debugging features in the Oracle Developer Studio dbx
debugger is enabling watchpoints during the execution of programs. A watchpoint, which is also called a data change breakpoint, can be used in dbx
to stop a program when the value of a variable or expression has changed. A watchpoint is similar to a breakpoint, except that a watchpoint stops execution when an address location is read or modified, whereas a breakpoint stops execution when an instruction is executed at a specified location.
This article intends to educate users on how to use the watchpoint facility in the Oracle Developer Studio dbx
debugger. The dbx
debugger can be used for both source-level and instruction-level debugging.
Additionally, the Oracle Solaris Dynamic Tracing (DTrace) facility is used to show how the internal states of the Oracle Solaris kernel can be traced with a simple D script.
Historically, watchpoints were implemented in software, and to some extent they slowed down program execution. The newer versions of microprocessors are equipped with debug registers that enable modern software debuggers such as dbx
to create hardware watchpoints. The hardware watchpoints are extremely fast and do not slow down the execution of programs.
For example, Intel and AMD architectures have eight debug registers, DR0
through DR7
. The DR0
through DR3
registers can be used for creating address breakpoints. Software can load a virtual (linear) address into any of the four registers, and enable breakpoints to occur when the address matches an instruction or data reference. The debug control register (DR7
) is used to establish the breakpoint conditions for the address breakpoint registers (DR0
through DR3
) and to enable debug exceptions for each address breakpoint register individually. DR6
is the debug status register. The microprocessor loads the debug status into DR6
when an enabled debug condition is encountered that causes a debug exception. This register is never cleared by the processor and must be cleared by software after the contents have been read. The DR4
and DR5
registers are reserved and cannot be used by software.
Fortunately, Oracle Solaris provides a well-defined interface called /proc
that shields developers from the complexity of different microprocessor architectures. Using the /proc
interface makes applications such as the dbx
debugger extremely portable across Oracle Solaris platforms. The Oracle Solaris OS runs on SPARC, x86, and x64 architectures.
The /proc
interface is a file system that provides access to the state of each process and lightweight process (LWP) in the system. Watchpoints are set and cleared through the /proc
file system interface, by opening the control file for a process and then sending a PCWATCH
command (see the proc
(4) man page for more details).
The PCWATCH
command is accompanied by a prwatch
data structure, which contains the address, the length of the area to be affected, and the type of access to be watched for: read, write, execute, and stop before or after the access.
A watchpoint is triggered when an LWP in the traced process makes a memory reference that covers at least one byte of a watched area and the memory reference matches the access mode specified by the PCWATCH
command. When an LWP triggers a watchpoint, it incurs a watchpoint trap (FLTWATCH
), which is generated by the Oracle Solaris kernel. If FLTWATCH
is being traced, the LWP stops; otherwise, it is sent a SIGTRAP
signal. If SIGTRAP
is being traced and is not blocked, then the LWP stops. At this point the dbx
debugger takes control and you can issue other dbx
commands to examine the states of the traced process.
dbx
DebuggerTo stop execution when a memory address has been accessed, type:
stop access mode address-expression [, byte-size-expression ]
mode specifies how the memory was accessed. It can be composed of one or all of the following letters:
r |
The memory at the specified address has been read |
w |
The memory has been written to |
x |
The memory has been executed |
mode can also contain either of the following:
a |
Stops the process after the access (default) |
b |
Stops the process before the access |
address-expression is any expression that can be evaluated to produce an address. If you give a symbolic expression, the size of the region to be watched is automatically deduced; you can override it by specifying byte-size-expression. You can also use nonsymbolic, typeless address expressions, in which case the size is mandatory.
If you typed the following command, execution would stop after the memory address 0xfffffd7fffdff7a
had been read:
(dbx) stop access r 0xfffffd7fffdff7a8, 4
If you typed the following command, execution would stop before the variable local
had been written to:
(dbx) stop access wb &local
Keep these points in mind when using the stop access
command:
b
.The older stop modify
command is still accepted for backward compatibility and maps to the appropriate stop access
command:
stop modify address-expression [, byte-size-expression ]
In the following a.cc
example, we would like to stop the process whenever the local variable is accessed for a write operation.
The a.cc
Test Case
#include <stdio.h>
int global = 0;
static int stat = 0;
void poker(int *ip)
{
*ip = 5;
}
main()
{
static int flocal;
int local;
global = 0;
stat = 0;
flocal = 0;
local = 0;
poker(&global);
poker(&stat);
poker(&flocal);
poker(&local);
}
The a.cc
test case is compiled as follows:
CC -g -m64 a.cc
By default, the C++ compiler generates the a.out
executable.
Now let's run the dbx
debugger on the a.out
executable and set a data change breakpoint (watchpoint) for the local variable.
% dbx a.out
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6'in your .dbxrc
Reading a.out
Reading ld.so.1
Reading libCstd.so.1
Reading libCrun.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
(2) stop in main
Running: a.out
(process id 10452)
stopped in main at line 16 in file "a.cc"
16 global = 0;
(dbx) stop access w&local
(3) stop access wa &local, 4
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 19 in file "a.cc"
19 local = 0;
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
8 *ip = 5;
(dbx)
As shown, the stop access
command with write access mode is used to set a watchpoint for the local variable. The &local
syntax stands for the address of the local
variable. The local
variable is defined as a four-byte integer, hence the size of the region to be watched is automatically deduced and appended to the command syntax.
The watchpoint trap is triggered twice for the local
variable. The first time is when the local
variable is assigned a value of zero in the main function. The second time is when the local
variable is assigned a value of five in the poker method.
This section describes how the watchpoint trap can be traced in the Oracle Solaris kernel using the Oracle Solaris Dynamic Tracing (DTrace) facility. It is assumed that you are already familiar with D script syntax, probes, and constructs. Otherwise, the following article is recommended for reading before you continue with this section: Using DTrace with Oracle Developer Studio Tools to Understand, Analyze, Debug, and Enhance Complex Applications.
The /usr/include/sys/fault.h
header file contains the names of all hardware faults that can be traced in the process. However, for this particular subject, we need to pay attention only to the FLTWATCH
fault. The FLTWATCH
fault or the number 12 is the watchpoint trap. The following D script shows how to use the fault probe in the proc provider to monitor the hardware faults.
The fault.d
D Script
#pragma D option quiet
dtrace:::BEGIN {
printf("Tracing hardware faults. Enter <control-c> to end.\n");
}
proc:::fault
{
@[execname, args[0], args[1]->__data.__fault.__addr,
args[1]->__data.__fault.__pc] = count();
}
END
{
printf("%10s %10s %16s %16s %10s\n",
"EXECUTABLE", "FAULT", "ADDRESS", "PC", "COUNT");
printa("%10s %10d %16p %16p %8@d\n", @);
}
The fault probe fires when a thread experiences a machine fault. The fault probe has two arguments: The fault code is in args[0]
. The kernel siginfo
structure corresponding to the fault is pointed to by args[1]
.
The kernel siginfo_t
structure is defined in the /usr/include/sys/siginfo.h
header file. The siginf_t
structure consists of a union of several structures. However, for this particular example, we are only interested in tracing the __addr
and __pc
fields of __fault
structure.
As shown in the fault.d
D script, the aggregation is used in the proc::fault
clause to collect data based on the following expressions:
execname |
The executable name |
args[0] |
The fault code |
args[1]->__data.__fault.__addr |
args[1] is a pointer to the siginfo_t structure.
__addr is the address of a watched area in memory. |
args[1]->__data.__fault.__pc |
args[1] is a pointer to siginfo_t structure.
__pc is the address of the instruction that accesses the
watched area in memory |
The count()()
function shows the number of times each fault is triggered in a process. The fault.d
script needs to be run in a separate terminal window. The following dtrace
command enables the fault probe in the Oracle Solaris kernel:
dtrace -s fault.d
At this point, the fault probe in the proc
provider is enabled and waiting to collect data.
Now, in a separate terminal window, let's run dbx
on the a.out
executable and enter the same sequence of commands shown in the previous section to set (and trigger) a data change breakpoint for the local variable.
As it is instructed in the terminal window from which the dtrace
command is invoked, the <control-c>
command ends the execution of the fault.d
script. DTrace generates the following output:
% dtrace -s fault.d
Tracing hardware faults. Enter <control-c> to end.
^C
EXECUTABLE FAULT ADDRESS PC COUNT
a.out 3 401048 0 1
a.out 3 fffffd7fff3ce570 0 1
a.out 4 401053 0 1
a.out 4 fffffd7fff3ce571 0 1
a.out 12 fffffd7fffdff7c8 401030 1
a.out 12 fffffd7fffdff7c8 401069 1
a.out 3 fffffd7fff3ce540 0 2
The FAULT
column lists all hardware faults that are traced in the a.out
process. The FLTBPT
fault or the number 3 is the breakpoint trap. The FLTTRACE
fault or the number 4 is the trace trap (single-step). However, as mentioned before, we only need to pay attention to FLTWATCH
fault or the number 12.
Based on the output of the fault.d
script, the a.out
process incurred the watchpoint trap twice for the 0xfffffd7fffdff7c8
address. As you may have already guessed, 0xfffffd7fffdff7c8
is the address of the local
variable in memory (see the output of dbx
in the previous section).
Two instruction addresses, 0x401030
and 0x401069
, are listed in the PC (Program Counter) column. These two instructions contain a memory reference to the watched area (0xfffffd7fffdff7c8
). Hence, the watchpoint trap is triggered for these instructions.
The next step is to figure out what these two instructions are. You can use dbx
to disassemble the code and inspect the assembly code for 0x401030
and 0x401069
instruction addresses.
It is assumed that you are already familiar with dbx
instruction-level debugging commands. Otherwise, the following article is recommended for reading before proceeding with rest of this section: AMD64 Instruction-Level Debugging With dbx.
Below is the output of dbx
. The dis
command is used to disassemble the portion of code that correspond to the 0x401030
and 0x401069
instruction addresses. The regs
command is used to display the contents of the general purpose registers.
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
8 *ip = 5;
(dbx) dis main
0x0000000000401040: main pushq %rbp
0x0000000000401041: main+0x0001: movq %rsp,%rbp
0x0000000000401044: main+0x0004: subq $0x0000000000000010,%rsp
0x0000000000401048: main+0x0008: movl $0x0000000000000000,global
0x0000000000401053: main+0x0013: movl $0x0000000000000000,stat
0x000000000040105e: main+0x001e: movl $0x0000000000000000,__1fEmain1AGflocal_
0x0000000000401069: main+0x0029: movl
$0x0000000000000000,0xfffffffffffffff8(%rbp)
0x0000000000401070: main+0x0030: movq $global,%rdi
0x0000000000401077: main+0x0037: movl $0x0000000000000000,%eax
0x000000000040107c: main+0x003c: call poker [ 0x401020, .-0x5c ]
(dbx) dis poker
0x0000000000401020: poker : pushq %rbp
0x0000000000401021: poker+0x0001: movq& %rsp,%rbp
0x0000000000401024: poker+0x0004: subq $0x0000000000000010,%rsp
0x0000000000401028: poker+0x0008: movq %rdi,0xfffffffffffffff8(%rbp)
0x000000000040102c: poker+0x000c: movq 0xfffffffffffffff8(%rbp),%r8
0x0000000000401030: poker+0x0010: movl
$0x0000000000000005,0x0000000000000000(%r8)
0x0000000000401038: poker+0x0018: leave
0x0000000000401039: poker+0x0019: ret
0x000000000040103a: poker+0x001a: nop
0x000000000040103c: _ex_deregister+0x01f4: nop
(dbx) regs
current frame: [1]
r15 0x0000000000000000
r14 0x0000000000000000
r13 0x0000000000000000
r12 0x0000000000000000
r11 0xfffffffffbc01ec8
r10 0x0000000048fe9d0a
r9 0x00000000000015da
r8 0xfffffd7fffdff7c8
rdi 0xfffffd7fffdff7c8
rsi 0xfffffd7fffdff7f8
rbp 0xfffffd7fffdff7b0
rbx 0xfffffd7fff3fac40
rdx 0xfffffd7fffdff808
rcx 0x0000000000093182
rax 0x0000000000000000
trapno 0x0000000000000001
err 0x0000000000000000
rip 0x0000000000401030:poker+0x10 movl
$0x0000000000000005,0x0000000000000000(%r8)
cs 0x0000000000000053
eflags 0x0000000000000286
rsp 0xfffffd7fffdff7a0
ss 0x000000000000004b
fs 0x0000000000000000
gs 0x0000000000000000
es 0x000000000000004b
ds 0x000000000000004b
fsbase 0xfffffd7fff382000
gsbase 0x0000000000000000
(dbx)
As shown above, the watchpoint is triggered when the number 5 is assigned to the *ip
formal parameter inside of the poker method at line 8 of the a.cc
program.
Similarly, the same assignment operation can be observed at the assembly level. The movl
instruction at the 0x401030
address dereferences the content of the %r8
register and assigns 5 to the variable whose address is 0xfffffd7fffdff7c8
(the local
variable).
The hardware-assisted watchpoints in dbx
are fast and very useful for debugging extremely difficult software defects. A watchpoint, also known as data change breakpoint, can be used in dbx
to stop a program when the value of a variable or expression has changed.
The DTrace facility enables you to monitor the internal states of the Oracle Solaris kernel in ways you could not have done it before. A simple D script, as shown in this article, can reveal how the Oracle Solaris kernel interacts with applications during execution.
Finally, using dbx
and DTrace simultaneously creates the ultimate debugging environment to unravel the most obscure software defects in your applications and even the Oracle Solaris kernel itself.