Debugging a JVM Level Hang Using dbx

by Poonam Bajaj, March 2008 (updated by Chihin Ko, June 2016)

The Oracle Developer Studio tool dbx is used to debug a particularly thorny JVM hang on an Oracle Solaris x86 system.

Table of Contents

Problem

For troubleshooting any JVM issue such as a crash or hang on an Oracle Solaris system, I generally use the Oracle Developer Studio dbx tool. For most such problems, we receive a core file from customers that is generated with the product build of the JVM. Because the product build of the JVM does not contain any debugging information, dbx cannot show the objects details of the JVM, which makes debugging and deciphering addresses very difficult.

How to Work It Out

The dbx debugger has a very useful feature wherein one can 'load' and 'use' an alternate shared object than the one that was used in the process run.

(dbx) loadobject -load alternate-lib

(dbx) loadobject -use alternate-lib

This feature is really useful for debugging processes or cores that have no debugging information. It makes it possible to recompile the library we want to debug with the -g flag and then load the library into the dbx debugging session.

Debugging a JVM Hang

In this situation, the customer was facing a hard hang on an Oracle Solaris x86 system with JDK version 1.5.0_08, and the process was not even responding to kill -3. The customer sent us a core file of the hung process.

For dbx to have the debugging information, we rebuilt the product version of JDK version 1.5.0_08 libjvm.so with the -g flag and loaded it, and then used this .so with dbx. With that change, dbx was able to get all the debugging information for the core file that was generated with libjvm.so without any debugging information.

The pstack on the core file shows that VMThread is trying to acquire Threads_lock, the main thread is waiting on a condition on Threads_lock, and another thread, 3992, is also trying to grab Threads_lock. So this lock seems to be causing the deadlock. Here is the core file.


-----------------  lwp# 1 / thread# 1  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef3a909 cond_wait_queue (807c540, 807c528, 0, 0) + 3b
 fef3ae02 _cond_wait (807c540, 807c528) + 66
 fef3ae44 cond_wait (807c540, 807c528) + 21
 fe92fc64 __1cHMonitorEwait6Mil_i_ (
                     807c4d0, 0, 0) + 2a4
 fecf526b __1cHThreadsKdestroy_vm6F_i_ (8073fb9, 0, 806dc14, fee4d33c, fee4a2a0, 10004) + 7b
 feb66ea3 jni_DestroyJavaVM (fee43560) + 123
 08051f8e main     (2, 807017c, 80432b4) + bfb
 080512fa ???????? (86, 8043468, 804346d, 8043494, 80434a8, 80434c6)
-----------------  lwp# 11 / thread# 11  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef399bb slow_lock (fbc92000, 807c528, 0) + 3d
 fef39ab1 mutex_lock_impl (807c528, 0) + ec
 fef39bbd mutex_lock (807c528) + 1a
 fe8c9a00 __1cFMutexElock6MpnGThread__v_ (
                    807c4d0, 8261480) + c0
 fe8cc1fe __1cFMutexElock6M_v_ (
                     807c4d0) + 4e
 fe996295 __1cUSafepointSynchronizeFbegin6F_v_ (fee493fc, 8261480, fedee000, f8199b74, 3, f8199bbc) + 85
 fea659c4 __1cIVMThreadEloop6M_v_ (8261480) + 134
 fea66478 __1cIVMThreadDrun6M_v_ (8261480) + 88
 fecbe0f2 __1cG_start6Fpv_0_ (8261480) + d2
 fef3fd52 _thr_setup (fbc92000) + 4e
 fef40040 _lwp_start (fbc92000, 0, 0, f8199ff8, fef40040, fbc92000)
-----------------  lwp# 3992 / thread# 3992  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef399bb slow_lock (6c173c00, 807c528, 0) + 3d
 fef39ab1 mutex_lock_impl (807c528, 0) + ec
 fef39bbd mutex_lock (807c528) + 1a
 fe8ea95e __1cFMutexbClock_without_safepoint_check6M_v_ (
                     807c4d0) + 7e
 fe95c56e __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (a27f780) + ee
 fecb6825 __1cGParkerEpark6Mix_v_ (a27f908, 0, d964b198, 45) + 1e5
 fed0963d Unsafe_Park (a27f840, 6807aba4, 0, d964b198, 45, 9480bcb8) + 1dd
 f876f2f9 ???????? (0, 0, a27c330, 0, 9ae69a8, 0)
 00000000 ???????? (fecf6330, fea09d40, fea03c80, fe964f40, fe91cc40, fe939c50)
 00000000 ???????? (fffffd2c, 3badd9be, f468fe16, e78b7254, fe1dbbcc, 0)
 00000000 ???????? ()

Now let's look at Threads_lock using dbx.


(dbx) print -fx Threads_lock
Threads_lock =  
                     0x807c4d0

(dbx) print -r *((Monitor*)0x807c4d0)
*((class Monitor *) 0x807c4d0) = {
    Monitor::Mutex::_lock_count      = 0
    Monitor::Mutex::_lock_event      = 0x807c500
    Monitor::Mutex::_suppress_signal = 0
     
                    Monitor::Mutex::_owner           = 0x97b0978   <----------------
    Monitor::Mutex::_name            = 0xfed97a0f "Threads_lock"
    Monitor::Mutex::INVALID_THREAD   = 0
    Monitor::_event   = (nil)
    Monitor::_counter = 0
    Monitor::_waiters = 1
    Monitor::_tickets = 0
}

(dbx) print -r *((Thread*)0x97b0978)
*((class Thread *) 0x97b0978) = {
    Thread::ThreadShadow::_pending_exception = (nil)
    Thread::_exception_file                       = (nil)
    Thread::_exception_line                       = 0
    Thread::_real_malloc_address                  = 0x97b0d28
    Thread::_rawmonitor_list                      = (nil)
    Thread::_SR_lock                              = 0x9b7b890
    Thread::_vm_suspend_depth                     = 0
    Thread::_suspend_flags                        = 0
    Thread::_active_handles                       = 0x97b0b48
    Thread::_free_handle_block                    = (nil)
    Thread::_last_handle_mark                     = 0x65c92618
    Thread::_oops_do_parity                       = 0
    Thread::_highest_lock                         = 0x65c92cb8 "x^I{^I\xe8^K{^I\xb0\xd6\x82\xf4\xa0H\xbeu"
    Thread::_unboosted_priority                   = NoPriority
    Thread::_tlab                                 = {
        Thread::ThreadLocalAllocBuffer::_start               = 0x762476d8
        Thread::ThreadLocalAllocBuffer::_top                 = 0x7624c750
        Thread::ThreadLocalAllocBuffer::_end                 = 0x76257308
        Thread::ThreadLocalAllocBuffer::_size                = 14078U
        Thread::ThreadLocalAllocBuffer::_refill_waste_limit  = 219U
        Thread::ThreadLocalAllocBuffer::_number_of_refills   = 6U
        Thread::ThreadLocalAllocBuffer::_fast_refill_waste   = 0
        Thread::ThreadLocalAllocBuffer::_slow_refill_waste   = 202U
        Thread::ThreadLocalAllocBuffer::_gc_waste            = 0
        Thread::ThreadLocalAllocBuffer::_slow_allocations    = 0
        Thread::ThreadLocalAllocBuffer::_allocation_fraction = {
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_average      = 0.006992618
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_sample_count = 1U
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_weight       = 35U
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_last_sample  = 0.006992618
        }
        Thread::ThreadLocalAllocBuffer::_target_refills      = 50
        Thread::ThreadLocalAllocBuffer::_global_stats        = 136222936
    }
    Thread::_vm_operation_started_count           = 1
    Thread::_vm_operation_completed_count         = 1
    Thread::_current_pending_monitor              = (nil)
    Thread::_current_pending_monitor_is_from_java = 1
    Thread::_current_waiting_monitor              = (nil)
     
                     Thread::_osthread                             = 0x97b0b00     <-------------
    Thread::_resource_area                        = 0x8c55f50
    Thread::_handle_area                          = 0x98dafc0
    Thread::_stack_base                           = 0x65c93000 "<bad>"
    Thread::_stack_size                           = 262144U
    Thread::_self_raw_id                          = 1731617792U
}

(dbx) print -r *((OSThread*)0x97b0b00)
*((class OSThread *) 0x97b0b00) = {
    OSThread::_start_proc                   = (nil)
    OSThread::_start_parm                   = (nil)
    OSThread::_state                        = RUNNABLE
    OSThread::_interrupted                  = 0
     
                     OSThread::_thread_id                    = 11982U
     
                     OSThread::_lwp_id                       = 11982U
    OSThread::_caller_sigmask               = {
        OSThread::__sigbits = (4U, 0, 0, 0)
    }
    OSThread::_vm_created_thread            = 1
    OSThread::_current_callback             = (nil)
    OSThread::_current_callback_lock        = (nil)
    OSThread::_interrupt_event              = 0x97b07d8
    OSThread::_saved_interrupt_thread_state = _thread_new
}

The preceding file shows that the owner of Threads_lock is the thread with the ID 11982. Now let's look at the following stack trace of this thread.


(dbx) thread t@11982
t@11982 (l@11982) stopped in __lwp_park at 0xfef4009b
0xfef4009b: __lwp_park+0x000b:  jae      __lwp_park+0x19        [ 0xfef400a9, .+0xe ]
(dbx) where
current thread: t@11982
=>[1] __lwp_park(0x0, 0x0), at 0xfef4009b
  [2] mutex_lock_queue(0x67366400, 0x0, 0x88a0598, 0x0), at 0xfef3911f
  [3] slow_lock(0x67366400, 0x88a0598, 0x0), at 0xfef399bb
  [4] mutex_lock_impl(0x88a0598, 0x0), at 0xfef39ab1
  [5] __mutex_lock(0x88a0598), at 0xfef39bbd
  [6] Mutex::wait_for_lock_blocking_implementation(0xa27f908, 0x97b0978), at 0xfe9d93ba
  [7] Mutex::lock(0xa27f908, 0x97b0978), at 0xfe8c99ea
  [8] Mutex::lock(0xa27f908), at 0xfe8cc1fe
  [9] Parker::unpark(0xa27f908), at 0xfecb69bb
  [10] os::interrupt(0xa27f780), at 0xfecbf6bb
  [11] Thread::interrupt(0xa27f780), at 0xfecf2ed9
  [12] JVM_Interrupt(0x97b0a38, 0x65c923fc, 0x65c923d0, 0x0, 0x65c923fc, 0xf4873020, 0x0, 0xf482f600, 0x65c923fc, 0x65c92424,
 0xf8202b6b, 0x0, 0xf8206399, 0x9593fdc0, 0x1, 0x0, 0x65c92400, 0xf482db6b, 0x65c92438, 0xf4873020), at 0xfeb92255
  [13] 0xf820843a(0x0, 0xf8206399, 0x9593fdc0, 0x1, 0x0, 0x65c92400), at 0xf820843a
  [14] 0xf8202b6b(0x0, 0x0, 0x97f59c18, 0x9593fdc0, 0x65c9243c, 0xf5746e95), at 0xf8202b6b
  [15] 0xf8202b6b(0x967a0c40, 0x9, 0x968f6eb8, 0x65c92464, 0xf57472be, 0x65c92498), at 0xf8202b6b
  [16] 0xf8202b6b(0x0, 0x968f6eb8, 0x967a0c40, 0x65c9249c, 0xf5845a0c, 0x65c924c4), at 0xf8202b6b
  [17] 0xf8202b6b(0x1b58, 0x967a0c40, 0x65c924c8, 0xf5845674, 0x65c92524, 0xf58a1f78), at 0xf8202b6b
  [18] 0xf8202b6b(0x7624c168, 0x7624c138, 0x7624c108, 0x7624c0d8, 0x7624bee8, 0x7624bd00), at 0xf8202b6b

The thread is trying to acquire Monitor 0xa27f908.

Now, let's look at this Monitor.


(dbx) print -r *((Monitor*)0xa27f908)
*((class Monitor *) 0xa27f908) = {
    Monitor::Mutex::_lock_count      = 0
     
                     Monitor::Mutex::_lock_event      = 0x88a0570  <-----------
    Monitor::Mutex::_suppress_signal = 0
    Monitor::Mutex::_owner           = (nil)
    Monitor::Mutex::_name            = 0xfed97c99 "Park Semaphore"
    Monitor::Mutex::INVALID_THREAD   = 0
    Monitor::_event   = (nil)
    Monitor::_counter = 0
    Monitor::_waiters = 0
    Monitor::_tickets = 0
}
(dbx) print -r *((Event*)0x88a0570)
More than one identifier 'Event'.
Select one of the following:
 0) Cancel
 1) class os::Solaris::Event
 2) class Solaris::Event
> 1
*((class os::Solaris::Event *) 0x88a0570) = {
    Event::_count   = 0
    Event::_nParked = 0
    Event::CachePad = (-6.7529205832009e-205, -6.6220952168053e-205, -1.2786650622506e-204, -1.9668306807887e-204)
    Event::_mutex   = (
    {
        Event::_lwp_mutex::flags = {
            Event::_lwp_mutex::flag1        = 4U
            Event::_lwp_mutex::flag2        = '\0'
            Event::_lwp_mutex::ceiling      = '\0'
            Event::_lwp_mutex::mbcp_type_un = {
                Event::_lwp_mutex::bcptype      = 0
                Event::_lwp_mutex::mtype_rcount = {
                    Event::_lwp_mutex::count_type1 = '\0'
                    Event::_lwp_mutex::count_type2 = '\0'
                }
            }
            Event::_lwp_mutex::magic        = 19800U
        }
        Event::_lwp_mutex::lock  = {
            Event::_lwp_mutex::lock64  = {
                Event::_lwp_mutex::pad = ""
            }
            Event::_lwp_mutex::lock32  = {
                Event::_lwp_mutex::ownerpid = 0
                Event::_lwp_mutex::lockword = 16842752U
            }
            Event::_lwp_mutex::owner64 = 72339069014638592ULL
        }
         
                     Event::_lwp_mutex::data  = 1813462016ULL
    }

)
    Event::_cond    = (
    {
        Event::_lwp_cond::flags = {
            Event::_lwp_cond::flag  = ""
            Event::_lwp_cond::type  = 0
            Event::_lwp_cond::magic = 17238U
        }
        Event::_lwp_cond::data  = 0
    }

)
    Event::FreeNext = (nil)
    Event::Immortal = 0
}

Here, _lwp_mutex->data, which is 0x6C173C00, is the owner of thread ulwp_t).

Let's look at this thread next.


(dbx) x 0x6C173C00/30X
0x6c173c00:      0x6c173c00 0x00000000 0x00000000 0x00000000
0x6c173c10:      0x00000000 0x00000000 0x00000000 0x00000000
0x6c173c20:      0x00000000 0x00000000 0x00000000 0xfef6b740
0x6c173c30:      0x00000000 0x00000000 0x6c174000 0x6c173800
0x6c173c40:      0x00000000 0x00000000 0x00000000 0x6803b000
0x6c173c50:      0x00040000 0x00000000 0x6807b000 0x00040000
0x6c173c60:      0x6803b000 0x00040000 0x00000000 0x00000398
0x6c173c70:       
                     0x00000f98 0x0000007f

Thread 0x00000f98 (which is 3992) is the thread ID holding this Event::mutex.

Now let's look at thread 3992.


(dbx) thread t@3992
t@3992 (l@3992) stopped in __lwp_park at 0xfef4009b
0xfef4009b: __lwp_park+0x000b:  jae      __lwp_park+0x19        [ 0xfef400a9, .+0xe ]
(dbx) where
current thread: t@3992
=>[1] __lwp_park(0x0, 0x0), at 0xfef4009b
  [2] mutex_lock_queue(0x6c173c00, 0x0, 0x807c528, 0x0), at 0xfef3911f
  [3] slow_lock(0x6c173c00, 0x807c528, 0x0), at 0xfef399bb
  [4] mutex_lock_impl(0x807c528, 0x0), at 0xfef39ab1
  [5] __mutex_lock(0x807c528), at 0xfef39bbd
  [6] Mutex::lock_without_safepoint_check(0x807c4d0), at 0xfe8ea95e
  [7] SafepointSynchronize::block(0xa27f780), at 0xfe95c56e
  [8] Parker::park(0xa27f908, 0x0, 0xd964b198, 0x45), at 0xfecb6825
  [9] Unsafe_Park(0xa27f840, 0x6807aba4, 0x0, 0xd964b198, 0x45, 0x9480bcb8), at 0xfed0963d

It is waiting to lock Threads_lock.

So threads 11982 and 3992 have a deadlock:

  • Thread 11982 holds Thread_lock and is waiting to acquire Monitor 0xa27f908.
  • Thread 3992 holds a lock on 0xa27f908 and is waiting to acquire Threads_lock.

This problem has been fixed in Java SE 5.0 update 15 and Java SE 6 with CR 6601725.

About the Author

Poonam Bajaj has worked at Sun since 2001 and then Oracle in the Java Sustaining group. This group is responsible for working on customer-escalated issues with Java related products and delivering update releases. Poonam loves to troubleshoot and resolve problems, and Poonam's area of particular expertise is Hotspot VM.