by Arup Nanda Oracle ACE
Published February 2007
In Part 1 of the series, you learned some useful commands not so widely known and some of the often used commands but not-so-well-known parameters to do your job more efficiently. Continuing on the series, now you will learn some slightly more advanced Linux commands useful for Oracle users, whether developers or DBAs.
Suppose you want to check the ORACLE_SID environment variable set in your shell. You will have to type:
echo $ORACLE_HOME
As a DBA or a developer, you frequently use this command and will quickly become tired of typing the entire 16 characters. Is there is a simpler way?
There is: the
alias
command. With this approach you can create a short alias, such as "os", to represent the entire command:
alias os='echo $ORACLE_HOME'
Now whenever you want to check the ORACLE_SID, you just type "os" (without the quotes) and Linux executes the aliased command.
However, if you log out and log back in, the alias is gone and you have to enter the alias command again. To eliminate this step, all you have to do is to put the command in your shell's profile file. For bash, the file is .bash_profile (note the period before the file name, that's part of the file's name) in your home directory. For bourne and korn shells, it's .profile, and for c-shell, .chsrc.
You can create an alias in any name. For instance, I always create an alias for the command
rm
as
rm -i
, which makes the
rm
command interactive.
alias rm=’rm -i’
Whenever I issue an
rm
command, Linux prompts me for confirmation, and unless I provide "y", it doesn't remove the file—thus I am protected form accidentally removing an important file. I use the same for
mv
(for moving the file to a new name), which prevents accidental overwriting of existing files, and
cp
(for copying the file).
Here is a list of some very useful aliases I like to define:
alias bdump='cd $ORACLE_BASE/admin/$ORACLE_SID/bdump'
alias l='ls -d .* --color=tty'
alias ll='ls -l --color=tty'
alias mv='mv -i'
alias oh='cd $ORACLE_HOME'
alias os='echo $ORACLE_SID'
alias rm='rm -i'
alias tns='cd $ORACLE_HOME/network/admin'
To see what aliases have been defined in your shell, use
alias
without any parameters.
However, there is a small problem. I have defined an alias,
rm
, that executes
rm -i
. This command will prompt for my confirmation every time I try to delete a file. But what if I want to remove a lot of files and am confident they can be deleted without my confirmation?
The solution is simple: To suppress the alias and use the command only, I will need to enter two single quotes:
$ ''rm *
Note, these are two single quotes (') before the rm command, not two double quotes. This will suppress the alias
rm
. Another approach is to use a backslash (\):
$ \rm *
To remove an alias previously defined, just use the
unalias
command:
$ unalias rm
The humble
ls
command is frequently used but rarely to its full potential. Without any options,
ls
merely displays all files and directories in tabular format.
$ ls
admin has mesg precomp
apex hs mgw racg
assistants install network rdbms
... output snipped ...
To show them in a list, use the -1 (this is the number 1, not the letter "l") option.
$ ls -1
admin
apex
assistants
... output snipped ...
This option is useful in shell scripts where the files names need to be fed into another program or command for manipulation.
You have most definitely used the -l (the letter "l", not the number "1") that displays all the attributes of the files and directories. Let's see it once again:
$ ls -l
total 272
drwxr-xr-x 3 oracle oinstall 4096 Sep 3 03:27 admin
drwxr-x--- 7 oracle oinstall 4096 Sep 3 02:32 apex
drwxr-x--- 7 oracle oinstall 4096 Sep 3 02:29 assistants
The first column shows the type of file and the permissions on it: "d" means directory, "-" means a regular file, "c" means a character device, "b" means a block device, "p" means named pipe, and "l" (that's a lowercase letter L, not I) means symbolic link.
One very useful option is --color, which shows the files in many different colors based on the type of file. Here is an example screenshot:
Note that files file1 and file2 are regular files. link1 is a symbolic link, shown in red; dir1 is a directory, shown in yellow; and pipe1 is a named pipe, shown in different colors for easier identification.
In some distros, the
ls
command comes pre-installed with an alias (described in the previous section) as
ls --color;
so you can see the files in color when you type "ls". This approach may be undesirable, however, especially if you have an output like that above. You can change the colors, but a quicker way may be just to turn off the alias:
$ alias ls="''ls"
Another useful option is the -F option, which appends a symbol after each file to show the type of the file - a "/" after directories, "@" after symbolic links, and "|" after named pipes.
$ ls -F
dir1/ file1 file2 link1@ pipe1|
If you have a subdirectory under a directory and you want to list only that directory,
ls -l
will show you the contents of the subdirectory as well. For instance, suppose the directory structure is like the following:
/dir1
+-->/subdir1
+--> subfile1
+--> subfile2
The directory dir1 has a subdirectory subdir1 and two files: subfile1 and subfile2. If you just want to see the attributes of the directory dir1, you issue:
$ ls -l dir1
total 4
drwxr-xr-x 2 oracle oinstall 4096 Oct 14 16:52 subdir1
-rw-r--r-- 1 oracle oinstall 0 Oct 14 16:48 subfile1
-rw-r--r-- 1 oracle oinstall 0 Oct 14 16:48 subfile2
Note that the directory dir1 is not listed in the output. Rather, the contents of the directory are displayed. This is expected behavior when processing directories. To show the directory dir1 only, you will have to use the
-d
command.
$ ls -dl dir1
drwxr-xr-x 3 oracle oinstall 4096 Oct 14 16:52 dir1
If you notice the output of the following
ls -l
output:
-rwxr-x--x 1 oracle oinstall 10457761 Apr 6 2006 rmanO
-rwxr-x--x 1 oracle oinstall 10457761 Sep 23 23:48 rman
-rwsr-s--x 1 oracle oinstall 93300507 Apr 6 2006 oracleO
-rwx------ 1 oracle oinstall 93300507 Sep 23 23:49 oracle
You will notice that the sizes of the files are shown in bytes. This may be easy in small files but when file sizes are pretty large, a long number may not be very easy to read. The option "-h" comes handy then, to display the size in a human readable form.
$ ls -lh
-rwxr-x--x 1 oracle oinstall 10M Apr 6 2006 rmanO
-rwxr-x--x 1 oracle oinstall 10M Sep 23 23:48 rman
-rwsr-s--x 1 oracle oinstall 89M Apr 6 2006 oracleO
-rwx------ 1 oracle oinstall 89M Sep 23 23:49 oracle
Note how the size has been shown in M (for megabytes), K (for kilobytes), and so on.
$ ls -lr
The parameter -r shows the output in the reverse order. In this command, the files will be shown in the reverse alphabetical order.
$ ls -lR
The -R operator makes the
ls
command execute recursively—that is, go under to the subdirectories and show those files too.
What if you want to show the largest to the smallest files? This can be done with the -S parameter.
$ ls -lS
total 308
-rw-r----- 1 oracle oinstall 52903 Oct 11 18:31 sqlnet.log
-rwxr-xr-x 1 oracle oinstall 9530 Apr 6 2006 root.sh
drwxr-xr-x 2 oracle oinstall 8192 Oct 11 18:14 bin
drwxr-x--- 3 oracle oinstall 8192 Sep 23 23:49 lib
Most Linux commands are about getting an output: a list of files, a list of strings, and so on. But what if you want to use some other command with the output of the previous one as a parameter? For example, the
file
command shows the type of the file (executable, ascii text, and so on); you can manipulate the output to show only the filenames and now you want to pass these names to the
ls -l
command to see the timestamp. The command
xargs
does exactly that. It allows you to execute some other commands on the output. Remember this syntax from Part 1:
file -Lz * | grep ASCII | cut -d":" -f1 | xargs ls -ltr
Let's dissect this command string. The first,
file -Lz *
, finds files that are symbolic links or compressed. It passes the output to the next command,
grep ASCII
, which searches for the string "ASCII" in them and produces the output similar to this:
alert_DBA102.log: ASCII English text
alert_DBA102.log.Z: ASCII text (compress'd data 16 bits)
dba102_asmb_12307.trc.Z: ASCII English text (compress'd data 16 bits)
dba102_asmb_20653.trc.Z: ASCII English text (compress'd data 16 bits)
Since we are interested in the file names only, we applied the next command,
cut -d":" -f1
, to show the first field only:
alert_DBA102.log
alert_DBA102.log.Z
dba102_asmb_12307.trc.Z
dba102_asmb_20653.trc.Z
Now, we want to use the
ls -l
command and pass the above list as parameters, one at a time. The
xargs
command allowed you to to that. The last part,
xargs ls -ltr
, takes the output and executes the command
ls -ltr
against them, as if executing:
ls -ltr alert_DBA102.log
ls -ltr alert_DBA102.log.Z
ls -ltr dba102_asmb_12307.trc.Z
ls -ltr dba102_asmb_20653.trc.Z
Thus
xargs
is not useful by itself, but is quite powerful when combined with other commands.
Here is another example, where we want to count the number of lines in those files:
$ file * | grep ASCII | cut -d":" -f1 | xargs wc -l
47853 alert_DBA102.log
19 dba102_cjq0_14493.trc
29053 dba102_mmnl_14497.trc
154 dba102_reco_14491.trc
43 dba102_rvwr_14518.trc
77122 total
(Note: the above task can also be accomplished with the following command:)
$ wc -l ‘file * | grep ASCII | cut -d":" -f1 | grep ASCII | cut -d":" -f1‘
The
xargs
version is given to illustrate the concept. Linux has several ways to achieve the same task; use the one that suits your situation best.
Using this approach you can quickly rename files in a directory.
$ ls | xargs -t -i mv {} {}.bak
The -i option tells
xargs
to replace {} with the name of each item. The -t option instructs
xargs
to print the command before executing it.
Another very useful operation is when you want to open the files for editing using vi:
$ file * | grep ASCII | cut -d":" -f1 | xargs vi
This command opens the files one by one using vi. When you want to search for many files and open them for editing, this comes in very handy.
It also has several options. Perhaps the most useful is the -p option, which makes the operation interactive:
$ file * | grep ASCII | cut -d":" -f1 | xargs -p vi
vi alert_DBA102.log dba102_cjq0_14493.trc dba102_mmnl_14497.trc
dba102_reco_14491.trc dba102_rvwr_14518.trc ?...
Here
xarg
asks you to confirm before running each command. If you press "y", it executes the command. You will find it immensely useful when you take some potentially damaging and irreversible operations on the file—such as removing or overwriting it.
The -t option uses a verbose mode; it displays the command it is about to run, which is a very helpful option during debugging.
What if the output passed to the
xargs
is blank? Consider:
$ file * | grep SSSSSS | cut -d":" -f1 | xargs -t wc -l
wc -l
0
$
Here searching for "SSSSSS" produces no match; so the input to xargs is all blanks, as shown in the second line (produced since we used the -t, or the verbose option). Although this may be useful, In some cases you may want to stop
xargs
if there is nothing to process; if so, you can use the -r option:
$ file * | grep SSSSSS | cut -d":" -f1 | xargs -t -r wc -l
$
The command exits if there is nothing to run.
Suppose you want to remove the files using the
rm
command, which should be the argument to the
xargs
command. However,
rm
can accept a limited number of arguments. What if your argument list exceeds that limit? The -n option to xargs limits the number of arguments in a single command line.
Here is how you can limit only two arguments per command line: Even if five files are passed to
xargs ls -ltr
, only two files are passed to
ls -ltr
at a time.
$ file * | grep ASCII | cut -d":" -f1 | xargs -t -n2 ls -ltr
ls -ltr alert_DBA102.log dba102_cjq0_14493.trc
-rw-r----- 1 oracle dba 738 Aug 10 19:18 dba102_cjq0_14493.trc
-rw-r--r-- 1 oracle dba 2410225 Aug 13 05:31 alert_DBA102.log
ls -ltr dba102_mmnl_14497.trc dba102_reco_14491.trc
-rw-r----- 1 oracle dba 5386163 Aug 10 17:55 dba102_mmnl_14497.trc
-rw-r----- 1 oracle dba 6808 Aug 13 05:21 dba102_reco_14491.trc
ls -ltr dba102_rvwr_14518.trc
-rw-r----- 1 oracle dba 2087 Aug 10 04:30 dba102_rvwr_14518.trc
Using this approach you can quickly rename files in a directory.
$ ls | xargs -t -i mv {} {}.bak
The -i option tells
xargs
to replace {} with the name of each item.
As you know, the
mv
command renames files. For example,
$ mv oldname newname
renames the file oldname to newname. However, what if you don't know the filenames yet? The
rename
command comes in really handy here.
rename .log .log.‘date +%F-%H:%M:%S‘ *
replaces all files with the extension .log with .log.<dateformat>. So sqlnet.log becomes sqlnet.log.2006-09-12-23:26:28.
Among the most popular for Oracle users is the
find
command. By now you know about using
find
to find files on a given directory. Here is an example to find files starting with the word "file" in the current directory:
$ find . -name "file*"
./file2
./file1
./file3
./file4
However, what if you want to search for names like FILE1, FILE2, and so on? The -name "file*" will not match them. For a case-insensitive search, use the -iname option:
$ find . -iname "file*"
./file2
./file1
./file3
./file4
./FILE1
./FILE2
You can limit your search to a specific type of files only. For instance, the above command will get the files of all types: regular files, directories, symbolic links, and so on. To search for only regular files, you can use the -type f parameter.
$ find . -name "orapw*" -type f
./orapw+ASM
./orapwDBA102
./orapwRMANTEST
./orapwRMANDUP
./orapwTESTAUX
The -type can take the modifiers f (for regular files), l (for symbolic links), d (directories), b (block devices), p (named pipes), c (character devices), s (sockets).
A slight twist to the above command is to combine it with the
file
command you learned about in Part 1. The command
file
tells you what type of file it is. You can pass it as a post processor for the output from
find
command. The -exec parameter executes the command following the parameter. In this case, the command to execute after the
find
is
file
:
$ find . -name "*oraenv*" -type f -exec file {} \;
./coraenv: Bourne shell script text executable
./oraenv: Bourne shell script text executable
This is useful when you want to find out if the ASCII text file could be some type of shell script.
If you substitute -exec with -ok, the command is executed but it asks for your confirmation first. Here's an example:
$ find . -name "sqlplus*" -ok {} \;
< {} ... ./sqlplus > ? y
SQL*Plus: Release 9.2.0.5.0 - Production on Sun Aug 6 11:28:15 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
< È* ... ./sqlplusO > ? n
$
Here we have asked the shell to find all programs starting with "sqlplus", and execute them. Note there is nothing between -ok and {}, so it will just execute the files it finds. It finds two files—sqlplus and sqlplusO—and asks in each case if you want to execute it. We answered "y" to the prompt against "sqlplus" and it executed. After exiting, it prompted the second file it found (sqlplusO) and for confirmation again again, to which we answered "n"—thus, it was not executed.
Oracle produces many extraneous files: trace files, log files, dump files, and so on. Unless they are cleaned periodically, they can fill up the filesystem and bring the database to a halt.
To ensure that doesn't happen, simply search for the files with extension "trc" and remove them if they are more than three days old. A simple command does the trick:
find . -name "*.trc" -ctime +3 -exec rm {} \;
To forcibly remove them prior to the three-day limit, use the -f option.
find . -name "*.trc" -ctime +3 -exec rm -f {} \;
If you just want to list the files:
find . -name "*.trc" -ctime +3 -exec ls -l {} \;
This command takes an input file and substitutes strings inside it with the parameters passed, similar to substituting for variables. For example, here is an input file:
$ cat temp
The COLOR fox jumped over the TYPE fence.
Were you to substitute the strings "COLOR" by "brown" and "TYPE" by "broken", you could use:
$ m4 -DCOLOR=brown -DTYPE=broken temp
The brown fox jumped over the broken fence.
Else, if you want to substitute "white" and "high" for the same:
$ m4 -DCOLOR=white -DTYPE=high temp
The white fox jumped over the high fence.
These commands are used to find out the where the executables mentioned are stored in the PATH of the user. When the executable is found in the path, they behave pretty much the same way and display the path:
$ which sqlplus
/u02/app/oracle/products/10.2.0.1/db1/bin/sqlplus
$ whence sqlplus
/u02/app/oracle/products/10.2.0.1/db1/bin/sqlplus
The output is identical. However, if the executable is not found in the path, the behavior is different. The
which
command produces an explicit message:
$ which sqlplus1
/usr/bin/which: no sqlplus1 in (/u02/app/oracle/products/10.2.0.1/db1/bin:/usr
/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin)
whereas the
whence
command produces no message:
$ whence sqlplus1]
and returns to shell prompt. This is useful in cases where the executable is not found in the path (instead of displaying the message):
$ whence invalid_command
$ which invalid_command
which: no invalid_command in (/usr/kerberos/sbin:/usr/kerberos/bin:/bin:/sbin:
/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:
/usr/bin/X11:/usr/X11R6/bin:/root/bin)
When
whence
does not find an executable in the path, it returns without any message but the return code is not zero. This fact can be exploited in shell scripts; for example:
RC=‘whence myexec‘
If [ $RC -ne "0" ]; then
echo "myexec is not in the $PATH"
fi
A very useful option to which is the -i option, which displays the alias as well as the executable, if present. For example, you saw the use of the alias at the beginning of this article. The
rm
command is actually an alias in my shell, and there is an
rm
command elsewhere in the system as well.
$ which ls
/bin/ls
$ which -i ls
alias ls='ls --color=tty'
/bin/ls
The default behavior of which is to show the first occurrence of the executable in the path. If the executable exists in different directories in the path, the subsequent occurrences are ignored. You can see all the occurrences of the executable via the -a option.
$ which java
/usr/bin/java
$ which -a java
/usr/bin/java
/home/oracle/oracle/product/11.1/db_1/jdk/jre/bin/java
The
top
command is probably the most useful one for an Oracle DBA managing a database on Linux. Say the system is slow and you want to find out who is gobbling up all the CPU and/or memory. To display the top processes, you use the command
top
.
Note that unlike other commands,
top
does not produce an output and sits still. It refreshes the screen to display new information. So, if you just issue
top
and leave the screen up, the most current information is always up. To stop and exit to shell, you can press Control-C.
$ top
18:46:13 up 11 days, 21:50, 5 users, load average: 0.11, 0.19, 0.18
151 processes: 147 sleeping, 4 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 12.5% 0.0% 6.7% 0.0% 0.0% 5.3% 75.2%
Mem: 1026912k av, 999548k used, 27364k free, 0k shrd, 116104k buff
758312k actv, 145904k in_d, 16192k in_c
Swap: 2041192k av, 122224k used, 1918968k free 590140k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
451 oracle 15 0 6044 4928 4216 S 0.1 0.4 0:20 0 tnslsnr
8991 oracle 15 0 1248 1248 896 R 0.1 0.1 0:00 0 top
1 root 19 0 440 400 372 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
7 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 0:33 0 kswapd
6 root 15 0 0 0 0 SW 0.0 0.0 0:14 0 kscand
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
... output snipped ...
Let's examine the different types of information produced. The first line:
18:46:13 up 11 days, 21:50, 5 users, load average: 0.11, 0.19, 0.18
shows the current time (18:46:13), that system has been up for 11 days; that the system has been working for 21 hours 50 seconds. The load average of the system is shown (0.11, 0.19, 0.18) for the last 1, 5 and 15 minutes respectively. (By the way, you can also get this information by issuing the
uptime
command.)
If the load average is not required, press the letter "l" (lowercase L); it will turn it off. To turn it back on press l again. The second line:
151 processes: 147 sleeping, 4 running, 0 zombie, 0 stopped
shows the number of processes, running, sleeping, etc. The third and fourth lines:
CPU states: cpu user nice system irq softirq iowait idle
total 12.5% 0.0% 6.7% 0.0% 0.0% 5.3% 75.2%
show the CPU utilization details. The above line shows that user processes consume 12.5% and system consumes 6.7%. The user processes include the Oracle processes. Press "t" to turn these three lines off and on. If there are more than one CPU, you will see one line per CPU.
The next two lines:
Mem: 1026912k av, 1000688k used, 26224k free, 0k shrd, 113624k buff
758668k actv, 146872k in_d, 14460k in_c
Swap: 2041192k av, 122476k used, 1918716k free 591776k cached
show the memory available and utilized. Total memory is "1026912k av", approximately 1GB, of which only 26224k or 26MB is free. The swap space is 2GB; but it's almost not used. To turn it off and on, press "m".
The rest of the display shows the processes in a tabular format. Here is the explanation of the columns:
Column | Description |
---|---|
PID | The process ID of the process |
USER | The user running the process |
PRI | The priority of the process |
NI | The nice value: The higher the value, the lower the priority of the task |
SIZE | Memory used by this process (code+data+stack) |
RSS | The physical memory used by this process |
SHARE | The shared memory used by this process |
STAT |
|
%CPU | The percentage of CPU used by this process |
%MEM | The percentage of memory used by this process |
TIME | The total CPU time used by this process |
CPU | If this is a multi-processor system, this column indicates the ID of the CPU this process is running on. |
COMMAND | The command issued by this process |
While the
top
is being displayed, you can press a few keys to format the display as you like. Pressing the uppercase M key sorts the output by memory usage. (Note that using lowercase m will turn the memory summary lines on or off at the top of the display.) This is very useful when you want to find out who is consuming the memory. Here is sample output:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
31903 oracle 15 0 75760 72M 72508 S 0.0 7.2 0:01 0 ora_smon_PRODB2
31909 oracle 15 0 68944 66M 64572 S 0.0 6.6 0:03 0 ora_mmon_PRODB2
31897 oracle 15 0 53788 49M 48652 S 0.0 4.9 0:00 0 ora_dbw0_PRODB2
Now that you learned how to interpret the output, let's see how to use command line parameters.
The most useful is -d, which indicates the delay between the screen refreshes. To refresh every second, use
top -d 1
.
The other useful option is -p. If you want to monitor only a few processes, not all, you can specify only those after the -p option. To monitor processes 13609, 13608 and 13554, issue:
top -p 13609 -p 13608 -p 13554
This will show results in the same format as the
top
command, but only those specific processes.
It's probably needless to say that the
top
utility comes in very handy for analyzing the performance of database servers. Here is a partial
top
output.
20:51:14 up 11 days, 23:55, 4 users, load average: 0.88, 0.39, 0.27
113 processes: 110 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 1.0% 0.0% 5.6% 2.2% 0.0% 91.2% 0.0%
Mem: 1026912k av, 1008832k used, 18080k free, 0k shrd, 30064k buff
771512k actv, 141348k in_d, 13308k in_c
Swap: 2041192k av, 66776k used, 1974416k free 812652k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16143 oracle 15 0 39280 32M 26608 D 4.0 3.2 0:02 0 oraclePRODB2...
5 root 15 0 0 0 0 SW 1.6 0.0 0:33 0 kswapd
... output snipped ...
Let's analyze the output carefully. The first thing you should notice is the "idle" column under CPU states; it's 0.0%—meaning, the CPU is completely occupied doing something. The question is, doing what? Move your attention to the column "system", just slightly left; it shows 5.6%. So the system itself is not doing much. Go even more left to the column marked "user", which shows 1.0%. Since user processes include Oracle as well, Oracle is not consuming the CPU cycles. So, what's eating up all the CPU?
The answer lies in the same line, just to the right under the column "iowait", which indicates 91.2%. This explains it all: the CPU is waiting for IO 91.2% of the time.
So why so much IO wait? The answer lies in the display. Note the PID of the highest consuming process: 16143. You can use the following query to determine what the process is doing:
select s.sid, s.username, s.program
from v$session s, v$process p
where spid = 16143
and p.addr = s.paddr
/
SID USERNAME PROGRAM
------------------- -----------------------------
159 SYS rman@prolin2 (TNS V1-V3)
The rman process is taking up the IO waits related CPU cycles. This information helps you determine the next course of action.
From the previous discussion you learned how to identify a CPU consuming resource. What if you find that a process is consuming a lot of CPU and memory, but you don't want to kill it? Consider the
top
output below:
$ top -c -p 16514
23:00:44 up 12 days, 2:04, 4 users, load average: 0.47, 0.35, 0.31
1 processes: 1 sleeping, 0 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.6% 8.7% 2.2% 0.0% 88.3% 0.0%
Mem: 1026912k av, 1010476k used, 16436k free, 0k shrd, 52128k buff
766724k actv, 143128k in_d, 14264k in_c
Swap: 2041192k av, 83160k used, 1958032k free 799432k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16514 oracle 19 4 28796 26M 20252 D N 7.0 2.5 0:03 0 oraclePRODB2...
Now that you confirmed the process 16514 is consuming a lot of memory, you can "freeze" it—but not kill it—using the
skill
command.
$ skill -STOP 1
After this, check the
top
output:
23:01:11 up 12 days, 2:05, 4 users, load average: 1.20, 0.54, 0.38
1 processes: 0 sleeping, 0 running, 0 zombie, 1 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.3% 0.0% 0.3% 0.0% 0.0% 2.3% 94.8%
Mem: 1026912k av, 1008756k used, 18156k free, 0k shrd, 3976k buff
770024k actv, 143496k in_d, 12876k in_c
Swap: 2041192k av, 83152k used, 1958040k free 851200k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16514 oracle 19 4 28796 26M 20252 T N 0.0 2.5 0:04 0 oraclePRODB2...
The CPU is now 94% idle from 0%. The process is effectively frozen. After some time, you may want to revive the process from coma:
$ skill -CONT 16514
This approach is immensely useful for temporarily freezing processes to make room for more important processes to complete.
The command is very versatile. If you want to stop all processes of the user "oracle", only one command does it all:
$ skill -STOP oracle
You can use a user, a PID, a command or terminal id as argument. The following stops all rman commands.
$ skill -STOP rman
As you can see,
skill
decides that argument you entered—a process ID, userid, or command—and acts appropriately. This may cause an issue in some cases, where you may have a user and a command in the same name. The best example is the "oracle" process, which is typically run by the user "oracle". So, when you want to stop the process called "oracle" and you issue:
$ skill -STOP oracle
all the processes of user "oracle" stop, including the session you may be on. To be completely unambiguous you can optionally give a new parameter to specify the type of the parameter. To stop a command called oracle, you can give:
$ skill -STOP -c oracle
The command snice is similar. Instead of stopping a process it makes its priority a lower one. First, check the top output:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3 root 15 0 0 0 0 RW 0.0 0.0 0:00 0 kapmd
13680 oracle 15 0 11336 10M 8820 T 0.0 1.0 0:00 0 oracle
13683 oracle 15 0 9972 9608 7788 T 0.0 0.9 0:00 0 oracle
13686 oracle 15 0 9860 9496 7676 T 0.0 0.9 0:00 0 oracle
13689 oracle 15 0 10004 9640 7820 T 0.0 0.9 0:00 0 oracle
13695 oracle 15 0 9984 9620 7800 T 0.0 0.9 0:00 0 oracle
13698 oracle 15 0 10064 9700 7884 T 0.0 0.9 0:00 0 oracle
13701 oracle 15 0 22204 21M 16940 T 0.0 2.1 0:00 0 oracle
Now, drop the priority of the processes of "oracle" by four points. Note that the higher the number, the lower the priority.
$ snice +4 -u oracle
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16894 oracle 20 4 38904 32M 26248 D N 5.5 3.2 0:01 0 oracle
Note how the NI column (for nice values) is now 4 and the priority is now set to 20, instead of 15. This is quite useful in reducing priorities.
Arup Nanda ( arup@proligence.com ) has been an Oracle DBA for more than 12 years, handling all aspects of database administration—from performance tuning to security and disaster recovery. He is a coauthor of PL/SQL for DBAs (O'Reilly Media, 2005), was Oracle Magazine's DBA of the Year in 2003, and is an Oracle ACE.