Victor Feng, March 2008
As you know,
prstat
and
vmstat
are used to check system performance. Sometimes, however, the results returned by
prstat
show a huge difference from those returned by
vmstat
.
When this happens, which results should we trust? As a rule of thumb, we should trust the results from
vmstat
. This article provides the following examples, which show why:
Note: This information applies to systems that run the Solaris 9 or 10 Operating System.
In a system with 8 GB memory, we have Oracle9i Database Enterprise Edition version 9.2.0.7, with maximum memory of about 1 GB and with Oracle shut down.
The
vmstat
output shows that the system has about 6.1 GB free memory:
# vmstat 3 3
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7189112 5702984 32 46 133 1 1 0 0 0 0 0 0 585 2205
1589 0 1 98
0 0 0 8831928 6112816 0 2 0 0 0 0 0 0 0 0 0 390 1492
1399 0 0 100
0 0 0 8831928 6112816 0 0 0 0 0 0 0 0 0 0 0 418 1599
1469 0 0 100
The prstat
output shows that user
oracle
uses 107 MB of memory:
# prstat -a
NPROC USERNAME SIZE RSS MEMORY TIME CPU
7 feng 22M 13M 0.2% 0:00:00 0.1%
73 root 1103M 371M 4.6% 0:24:21 0.0%
16 oracle 489M 107M 1.3% 0:00:03 0.0%
28 cpadmin 2172M 647M 8.2% 0:02:46 0.0%
1 lp 2920K 1352K 0.0% 0:00:00 0.0%
1 daemon 2512K 1792K 0.0% 0:00:00 0.0%
Now we'll start up Oracle:
SQL> startupORACLE instance started.
Total System Global Area 1075284744 bytes
Now the vmstat
output shows that the system has about 5.3 GB of free memory:
# vmstat 3 3
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7189160 5702992 32 46 133 1 1 0 0 0 0 0 0 585 2205
1589 0 1 98
0 0 0 6993104 5364640 0 2 0 0 0 0 0 0 0 0 0 387 1498
1407 0 0 100
0 0 0 6993104 5364640 0 0 0 0 0 0 0 0 0 0 0 379 1503
1406 0 0 100
And the prstat
output shows that user
oracle
uses 7.5 GB of memory:
# prstat -a
NPROC USERNAME SIZE RSS MEMORY TIME CPU
6 feng 20M 12M 0.1% 0:00:00 0.1%
73 root 1103M 371M 4.2% 0:24:21 0.1%
31 oracle 18G 7561M 88% 0:00:02 0.0%
1 lp 2920K 1352K 0.0% 0:00:00 0.0%
1 daemon 2512K 1792K 0.0% 0:00:00 0.0%
27 cpadmin 2153M 638M 7.4% 0:02:46 0.0%
The prstat -a
command simply adds up the memory of each process of a user.
# ps -eo user,rss,pid,args | grep '^ *oracle' | sort -r -n -k 2,2
oracle 486248 20090 oracleORALSCOT (DESCRIPTION=(LOCAL=YES)
(ADDRESS=(PROTOCOL=beq)))
oracle 483112 20078 ora_qmn0_ORALSCOT
oracle 482336 20093 oracleORALSCOT (LOCAL=NO)
oracle 481792 20095 oracleORALSCOT (LOCAL=NO)
oracle 480800 20066 ora_dbw0_ORALSCOT
oracle 479560 20072 ora_smon_ORALSCOT
oracle 479168 20068 ora_lgwr_ORALSCOT
oracle 479096 20074 ora_reco_ORALSCOT
oracle 478632 20076 ora_cjq0_ORALSCOT
oracle 478536 20064 ora_pmon_ORALSCOT
oracle 478056 20070 ora_ckpt_ORALSCOT
oracle 477208 20080 ora_s000_ORALSCOT
oracle 476560 20082 ora_d000_ORALSCOT
oracle 476040 20086 ora_arc1_ORALSCOT
oracle 475984 20084 ora_arc0_ORALSCOT
I wrote a perl script that uses the same logic as prstat -a
and shows that user
oracle
uses 7.2 GB of memory:
# ./psusedmemorybyuser.pl
User Used Vir Mem Used Phy Mem Percent of Phy Mem
oracle 17478336 7277728 88.839453125
root 2185424 839112 10.24306640625
lp 2920 1352 0.01650390625
feng 22648 14112 0.172265625
daemon 2512 1792 0.021875
cpadmin 2223928 660024 8.05693359375
The prstat
command should not simply add up the memory of each process of a user, because many processes share memory with each other. Let's take a look at process 20078 and see how much memory it really uses:
# pmap -x 20078
20078: ora_qmn0_ORALSCOT
Address Kbytes RSS Anon Locked Mode Mapped File
0000000100000000 55704 23824 - - r-x-- oracle
0000000103764000 904 504 128 - rwx-- oracle
0000000103846000 464 336 336 - rwx-- [ heap ]
..............
FFFFFFFF7F500000 8 8 8 - rwx-- [ anon ]
FFFFFFFF7F600000 192 192 - - r-x-- ld.so.1
FFFFFFFF7F730000 16 16 16 - rwx-- ld.so.1
FFFFFFFF7FFDC000 144 144 144 - rw--- [ stack ]
---------------- ------- ------- ----- ----------
total Kb 1135000 741936 1792 -
The additional memory needed for process 20078 is 1,792 Kbytes (Anon), and the rest of the memory is from the shared memory.
Let's continue by looking at another example, this time for CPU usage on the same system with Oracle running. The
consumecpu.sh
script consumes one CPU 100%.
# vmstat 3 3
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7191624 5702752 32 47 133 1 1 0 0 0 0 0 0 584 2204
1588 0 1 98
0 0 0 6991608 5363216 0 2 0 0 0 0 0 0 0 0 0 404 1551
1431 0 0 100
0 0 0 6991608 5363216 0 0 0 0 0 0 0 0 0 0 0 606 1526
1458 0 1 99
#
# ./consumecpu.sh &
[3] 23485
# vmstat 3 3
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7191608 5702728 33 48 133 1 1 0 0 0 0 0 0 584 2206
1588 0 1 98
0 0 0 6991288 5362944 1887 15569 0 0 0 0 0 0 0 0 0 449 35719
2542 16 35 49
0 0 0 6991272 5362880 1895 15633 0 0 0 0 0 0 0 0 0 463 35972
2567 17 33 49
#
# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
23485 root 1136K 1000K run 0 0 0:00:08 4.3% consumecpu.sh/1
13182 root 1952K 1736K cpu0 59 0 0:00:00 0.1% prstat/1
25955 root 17M 12M sleep 53 2 0:00:11 0.0% ns-httpd/2
So, based on vmstat
,
consumecpu.sh
consumes 50% of the CPU. But based on
prstat
, it consumes 4.3% of the CPU.
Here's what the
consumecpu.sh
script does:
# cat consumecpu.sh
#!/bin/sh
while true
do
:
done
I ran truss
to trace the number of system call forks caused by
consumecpu.sh
.
# truss -c -t fork ./consumecpu.sh
^Csyscall seconds calls errors
fork .316 804
-------- ------ -----
sys totals: .316 804 0
usr time: .030
elapsed: 3.510
As you can see, there are many forks for the while
loop in the Bourne shell. These processes are so short-lived that
prstat
is not able to capture them.
Let's run the
while
loop in the Bash shell:
# cat consumecpu.sh
#!/bin/bash
while true
do
:
done
# truss -c -t fork ./consumecpu.sh
^Csyscall seconds calls errors
-------- ------ -----
sys totals: .000 0 0
usr time: .000
elapsed: 3.670
There are no forks.
# vmstat 3 3kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7191304 5702200 33 54 132 1 1 0 0 0 0 0 0 584 2218
1588 0 1 98
0 0 0 6994808 5366080 0 2 0 0 0 0 0 0 0 0 0 469 1567
1464 0 1 99
0 0 0 6994808 5366080 0 2 0 0 0 0 0 0 0 0 0 469 1567
1464 0 1 99
#
# ./consumecpu.sh &
[5] 4066
# vmstat 3 3
kthr memory page disk faults
cpu
r b w swap free re mf pi po fr de sr m0 m4 m5 m1 in sy
cs us sy id
0 0 0 7191344 5702272 33 54 133 1 1 0 0 0 0 0 0 584 2219
1588 0 1 98
0 0 0 6991776 5363040 0 2 0 0 0 0 0 0 0 0 0 453 1628
1448 50 0 50
0 0 0 6991776 5363032 1 0 0 0 0 0 0 0 0 0 0 494 1559
1415 50 0 50
#
# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
4066 root 2520K 1872K cpu0 30 0 0:01:24 49% consumecpu.sh/1
4068 root 1952K 1728K cpu1 49 0 0:00:00 0.1% prstat/1
20078 oracle 1108M 472M sleep 59 0 0:00:01 0.0% oracle/1
Now both vmstat
and
prstat
show that
consumecpu.sh
consumes 49% of the CPU.
Thanks to Darren Dunham for his contribution!
Note: This tech tip expands on ideas explored in discussion groups online.