Understanding the Differences Between prstat and vmstat Output in the Solaris OS

Victor Feng, March 2008

As you know, prstat and vmstat are used to check system performance. Sometimes, however, the results returned by prstat show a huge difference from those returned by vmstat.

When this happens, which results should we trust? As a rule of thumb, we should trust the results from vmstat. This article provides the following examples, which show why:

Note: This information applies to systems that run the Solaris 9 or 10 Operating System.

Memory Usage Example

In a system with 8 GB memory, we have Oracle9i Database Enterprise Edition version 9.2.0.7, with maximum memory of about 1 GB and with Oracle shut down.

The vmstat output shows that the system has about 6.1 GB free memory:

# vmstat 3 3

kthr      memory            page            disk          faults
cpu
r b w   swap    free  re mf  pi po fr de sr m0 m4 m5 m1   in   sy
cs   us sy  id
0 0 0 7189112 5702984 32 46 133  1  1  0  0  0  0  0  0  585 2205
1589  0  1  98
0 0 0 8831928 6112816  0  2   0  0  0  0  0  0  0  0  0  390 1492
1399  0  0 100
0 0 0 8831928 6112816  0  0   0  0  0  0  0  0  0  0  0  418 1599
1469  0  0 100

The prstat output shows that user oracle uses 107 MB of memory:

# prstat -a

NPROC USERNAME  SIZE    RSS MEMORY      TIME  CPU
     7 feng       22M   13M   0.2%   0:00:00 0.1%
    73 root     1103M  371M   4.6%   0:24:21 0.0%
    16 oracle    489M  107M   1.3%   0:00:03 0.0%
    28 cpadmin  2172M  647M   8.2%   0:02:46 0.0%
     1 lp       2920K 1352K   0.0%   0:00:00 0.0%
     1 daemon   2512K 1792K   0.0%   0:00:00 0.0%

Now we'll start up Oracle:

SQL> startupORACLE instance started.
Total System Global Area 1075284744 bytes

Now the vmstat output shows that the system has about 5.3 GB of free memory:

# vmstat 3 3

kthr      memory            page            disk          faults
cpu
r b w   swap    free  re mf  pi po fr de sr m0 m4 m5 m1   in   sy
cs   us sy  id
0 0 0 7189160 5702992 32 46 133  1  1  0  0  0  0  0  0  585 2205
1589  0  1  98
0 0 0 6993104 5364640  0  2  0   0  0  0  0  0  0  0  0  387 1498
1407  0  0 100
0 0 0 6993104 5364640  0  0  0   0  0  0  0  0  0  0  0  379 1503
1406  0  0 100

And the prstat output shows that user oracle uses 7.5 GB of memory:

# prstat -a

NPROC USERNAME  SIZE    RSS MEMORY      TIME  CPU
     6 feng       20M   12M   0.1%   0:00:00 0.1%
    73 root     1103M  371M   4.2%   0:24:21 0.1%
    31 oracle     18G 7561M    88%   0:00:02 0.0%
     1 lp       2920K 1352K   0.0%   0:00:00 0.0%
     1 daemon   2512K 1792K   0.0%   0:00:00 0.0%
    27 cpadmin  2153M  638M   7.4%   0:02:46 0.0%

The prstat -a command simply adds up the memory of each process of a user.

# ps -eo user,rss,pid,args | grep '^ *oracle' | sort -r -n -k 2,2

  oracle 486248 20090 oracleORALSCOT (DESCRIPTION=(LOCAL=YES)
(ADDRESS=(PROTOCOL=beq)))
  oracle 483112 20078 ora_qmn0_ORALSCOT
  oracle 482336 20093 oracleORALSCOT (LOCAL=NO)
  oracle 481792 20095 oracleORALSCOT (LOCAL=NO)
  oracle 480800 20066 ora_dbw0_ORALSCOT
  oracle 479560 20072 ora_smon_ORALSCOT
  oracle 479168 20068 ora_lgwr_ORALSCOT
  oracle 479096 20074 ora_reco_ORALSCOT
  oracle 478632 20076 ora_cjq0_ORALSCOT
  oracle 478536 20064 ora_pmon_ORALSCOT
  oracle 478056 20070 ora_ckpt_ORALSCOT
  oracle 477208 20080 ora_s000_ORALSCOT
  oracle 476560 20082 ora_d000_ORALSCOT
  oracle 476040 20086 ora_arc1_ORALSCOT
  oracle 475984 20084 ora_arc0_ORALSCOT

I wrote a perl script that uses the same logic as prstat -a and shows that user oracle uses 7.2 GB of memory:

# ./psusedmemorybyuser.pl

User    Used Vir Mem     Used Phy Mem   Percent of Phy Mem
oracle      17478336          7277728         88.839453125
root         2185424           839112         10.24306640625
lp              2920             1352          0.01650390625
feng           22648            14112          0.172265625
daemon          2512             1792          0.021875
cpadmin      2223928           660024          8.05693359375

The prstat command should not simply add up the memory of each process of a user, because many processes share memory with each other. Let's take a look at process 20078 and see how much memory it really uses:

# pmap -x 20078

20078:  ora_qmn0_ORALSCOT
         Address  Kbytes     RSS  Anon  Locked Mode  Mapped File
0000000100000000   55704   23824     -      - r-x--  oracle
0000000103764000     904     504   128      - rwx--  oracle
0000000103846000     464     336   336      - rwx--  [ heap ]
..............
FFFFFFFF7F500000       8       8     8      - rwx--  [ anon ]
FFFFFFFF7F600000     192     192     -      - r-x--  ld.so.1
FFFFFFFF7F730000      16      16    16      - rwx--  ld.so.1
FFFFFFFF7FFDC000     144     144   144      - rw---  [ stack ]
---------------- ------- ------- -----   ----------
        total Kb 1135000  741936  1792          -

The additional memory needed for process 20078 is 1,792 Kbytes (Anon), and the rest of the memory is from the shared memory.

CPU Usage Example

Let's continue by looking at another example, this time for CPU usage on the same system with Oracle running. The consumecpu.sh script consumes one CPU 100%.

# vmstat 3 3

kthr      memory            page            disk          faults
cpu
r b w   swap    free  re mf  pi po fr de sr m0 m4 m5 m1   in   sy
cs  us sy  id
0 0 0 7191624 5702752 32 47 133  1  1  0  0  0  0  0  0  584 2204
1588 0  1  98
0 0 0 6991608 5363216  0  2   0  0  0  0  0  0  0  0  0  404 1551
1431 0  0 100
0 0 0 6991608 5363216  0  0   0  0  0  0  0  0  0  0  0  606 1526
1458 0  1  99
#
# ./consumecpu.sh &
[3] 23485
# vmstat 3 3
kthr      memory            page                 disk         faults
cpu
r b w   swap    free    re    mf  pi po fr de sr m0 m4 m5 m1  in    sy
cs   us sy  id
0 0 0 7191608 5702728   33    48 133  1  1  0  0  0  0  0  0 584  2206
1588  0  1  98
0 0 0 6991288 5362944 1887 15569   0  0  0  0  0  0  0  0  0 449 35719
2542 16 35  49
0 0 0 6991272 5362880 1895 15633   0  0  0  0  0  0  0  0  0 463 35972
2567 17 33  49
#
# prstat
  PID USERNAME SIZE   RSS STATE PRI NICE    TIME  CPU PROCESS/NLWP
23485 root    1136K 1000K run     0    0 0:00:08 4.3% consumecpu.sh/1
13182 root    1952K 1736K cpu0   59    0 0:00:00 0.1% prstat/1
25955 root      17M   12M sleep  53    2 0:00:11 0.0% ns-httpd/2

So, based on vmstat, consumecpu.sh consumes 50% of the CPU. But based on prstat, it consumes 4.3% of the CPU.

Here's what the consumecpu.sh script does:

# cat consumecpu.sh

#!/bin/sh
while true
do
        :
done

I ran truss to trace the number of system call forks caused by consumecpu.sh.

# truss -c -t fork ./consumecpu.sh

^Csyscall               seconds   calls  errors
fork                     .316     804
                       --------  ------  -----
sys totals:              .316     804       0
usr time:                .030
elapsed:                3.510

As you can see, there are many forks for the while loop in the Bourne shell. These processes are so short-lived that prstat is not able to capture them.

Let's run the while loop in the Bash shell:

# cat consumecpu.sh

#!/bin/bash
while true
do
        :
done

# truss -c -t fork ./consumecpu.sh
^Csyscall               seconds   calls  errors
                       --------  ------  -----
sys totals:              .000       0      0
usr time:                .000
elapsed:                3.670

There are no forks.

# vmstat 3 3kthr      memory            page            disk          faults
cpu
r b w   swap    free  re mf  pi po fr de sr m0 m4 m5 m1   in   sy
cs   us sy id
0 0 0 7191304 5702200 33 54 132  1  1  0  0  0  0  0  0  584 2218
1588  0  1 98
0 0 0 6994808 5366080  0  2   0  0  0  0  0  0  0  0  0  469 1567
1464  0  1 99
0 0 0 6994808 5366080  0  2   0  0  0  0  0  0  0  0  0  469 1567
1464  0  1 99
#
# ./consumecpu.sh &
[5] 4066
# vmstat 3 3
kthr      memory            page            disk          faults
cpu
r b w   swap    free  re  mf pi  po fr de sr m0 m4 m5 m1   in   sy
cs   us sy id
0 0 0 7191344 5702272 33  54 133  1  1  0  0  0  0  0  0  584 2219
1588  0  1 98
0 0 0 6991776 5363040  0   2   0  0  0  0  0  0  0  0  0  453 1628
1448 50  0 50
0 0 0 6991776 5363032  1   0   0  0  0  0  0  0  0  0  0  494 1559
1415 50  0 50
#
# prstat
PID USERNAME  SIZE   RSS STATE PRI NICE   TIME  CPU PROCESS/NLWP
4066     root  2520K 1872K cpu0  30    0 0:01:24  49% consumecpu.sh/1
4068     root  1952K 1728K cpu1  49    0 0:00:00 0.1% prstat/1
20078   oracle 1108M  472M sleep 59    0 0:00:01 0.0% oracle/1

Now both vmstat and prstat show that consumecpu.sh consumes 49% of the CPU.

Acknowledgements

Thanks to Darren Dunham for his contribution!

Note: This tech tip expands on ideas explored in discussion groups online.