How to Configure a Failover Guest Domain as a Single-Node Oracle Solaris Cluster

by Venkat Chennuru
Published November 2014

Using Oracle Solaris Cluster 4.2 on Oracle Solaris 11.2

This article a step-by-step example of how to deploy an Oracle VM Server for SPARC guest domain in a highly available failover setup and configure the guest domain as a single-node cluster by using Oracle Solaris Cluster 4.2 on Oracle Solaris 11.2.

This article provides a step-by-step example of how to deploy an Oracle VM Server for SPARC guest domain in a highly available failover setup and configure the guest domain as a single-node cluster by using Oracle Solaris Cluster 4.2 on Oracle Solaris 11.2. This configuration enables the protection of guest domains from planned and unplanned downtime by automating the failover of a guest domain through restart on an alternate cluster node. Automated failover provides protection in case there is a component outage or the guest domain needs to be migrated for preventive maintenance. The single-node cluster in the guest domain can help keep applications up through process monitoring and restart facilities that are available in the single-node cluster.

About Oracle Solaris Cluster

Oracle Solaris Cluster delivers two different solutions for protecting Oracle VM Server for SPARC deployments (also known as Logical Domains or LDoms).

First, it is possible to use Oracle VM Server for SPARC domains as cluster nodes. This configuration is similar to the traditional "physical" server clusters, but Oracle Solaris Cluster is installed in a Logical Domain (control, guest, or I/O domain). Applications running in that domain can be monitored through built-in or custom cluster agents, and applications are restarted on another domain upon demand or when the domain or the server fails.
The second possibility is to protect a domain by using a specific agent called the Oracle Solaris Cluster HA for Oracle VM Server for SPARC (HA for Oracle VM Server) data service. In this case, Oracle Solaris Cluster is installed in the server control domain and can manage not only applications but also guest domains as Oracle Solaris Cluster resources, thanks to HA for Oracle VM Server. This high availability (HA) agent controls and manages a guest domain as a "black box." It can fail over the guest domain in case of failure, but it can also use domain migration procedures (live migration or warm migration) to operate a managed switchover.

This article discusses a third possibility that is similar to the second, where the guest domain is configured as single-node cluster that manages application monitoring and application restarts inside the guest domain.

Instructions in this article provide details on how to set up a guest domain under Oracle Solaris Cluster control. As a prerequisite, you must install a two-node cluster using two control domains. For more information about this installation, see the article "How to Install and Configure a Two-Node Cluster" and the Oracle Solaris Cluster Software Installation Guide.

About Oracle VM Server for SPARC and the HA for Oracle VM Server Data Service

Oracle VM Server for SPARC provides the ability to split a single physical system into multiple, independent virtual systems. This is achieved by an additional software application in the firmware layer, interposed between the operating system and the hardware platform called the hypervisor. It abstracts the hardware and can expose or hide various resources, allowing for the creation of resource partitions that can operate as discrete systems, complete with virtual CPU, memory, and I/O devices. The administrative operations to create and manage the VM domain are performed in the control domain via the LDom manager interface.

Control domains must be configured as Oracle Solaris Cluster nodes in order to host a failover guest domain service. The virtual services configuration must be identical on all the potential primary nodes. The guest domain that will be put under Oracle Solaris Cluster control can be created on any one of the nodes of the cluster. Once the guest domain is created, the domain configuration information is retrieved by running the ldm list-constraints -x <ldom> command and it is stored in the Cluster Configuration Repository (CCR), which is accessible from all cluster nodes. This globally accessible information is used by the Oracle Solaris Cluster HA for Oracle VM Server for SPARC data service to create or destroy the domain on the node where the resource group is brought online or offline, respectively.

The data service provides a mechanism for orderly starting, shutting down, fault monitoring, and automatic failover of the Oracle VM Server for SPARC guest domain. In case the guest domain needs to be relocated to another cluster node, while under Oracle Solaris Cluster control, the data service tries live migration of the guest domain first and if that fails for any reason, it resorts to normal migration. This live migration feature requires that the boot disk be accessible from the current primary node and new primary nodes simultaneously.

Configuration Assumptions

This article assumes the following configuration is used:

Oracle Solaris 11.2 and Oracle Solaris Cluster 4.2 are installed on both control domains of two SPARC T-Series systems from Oracle. It is also possible to use Oracle Solaris 11.1 with Oracle Solaris Cluster 4.2.
A two-node cluster is configured with the two control domains of the servers.
The Image Packaging System repositories for Oracle Solaris and Oracle Solaris Cluster are already configured on the cluster nodes.
The cluster hardware is a supported configuration for Oracle Solaris Cluster 4.2 software as well as for the Oracle VM Server for SPARC release. For more information, see the Oracle Solaris Cluster 4.x Compatibility Guide.
Each node has two spare network interfaces to be used as private interconnects, also known as transports, and at least one network interface that is connected to the public network.
An Oracle ZFS Storage Appliance is configured on the cluster. For more information, see "How to Install an Oracle ZFS Storage Appliance in a Cluster."

In addition, it is recommended that you enable Jumbo Frames on the cluster interconnects to improve the performance of the live migration so that the Oracle Solaris Cluster switchover can be faster. It is also recommended that you have console access to the nodes during administration, but that is not required.

Your setup looks like Figure 1. You might have fewer or more devices, depending on your system or network configuration.

Figure 1. Oracle Solaris Cluster hardware configuration

For more information about the various topologies that are supported, see the Oracle Solaris Cluster Concepts Guide.

Requirements

Before you proceed, be aware of the following requirements:

The boot disk for the failover LDom configuration must reside on a global file system or a network file system (NFS), or it must be a raw shared device.
The services provided by I/O domains must be configured identically on both the nodes.
The LDom must be configured on only one node. It is active on only one node at a time.

Enable a Logical Domain to Run in a Failover Configuration Using a Global File System

Prepare the file system:

In a failover configuration, the logical domain's boot disk must be on a global file system, a network file system (NFS), or a raw shared disk. The boot disk must be accessible from all potential primaries simultaneously for live migration to work.

The example in this article uses an NFS location to host the boot disk for the failover LDom. Oracle Solaris Cluster provides the SUNW.ScalMountPoint service to manage NFS file system mounts. Use the SUNW.ScalMountPoint service to manage the NFS mounts used in this configuration.
1. Register the SUNW.ScalMountPoint resource type.
  
  phys-schost-1# /usr/cluster/bin/clrt register SUNW.ScalMountPoint
2. Create the scalable resource group to host the scalable resource.
```
phys-schost-1# /usr/cluster/bin/clrg create -S -p \

Maximum_primaries=2 -p Desired_primaries=2 ldom-scalrg
```
3. Create the scalable resource to mount the NFS file system on all nodes.
```
phys-schost-1# /usr/cluster/bin/clrs create -g ldom-scalrg -t \

SUNW.ScalMountPoint -x \
MountPointDir=/disks -x FileSystemType=nas -x \
TargetFileSystem=nfs-server:/export/disks ldom-scalrs
```
4. Bring the resource group online to mount the NFS location on both nodes.
  
  phys-schost-1# /usr/cluster/bin/clrg online -eM ldom-scalrg
5. Create the HA LDom resource group.
```
phys-schost-1# /usr/cluster/bin/clrg create -p \

rg_affinities=++ldom-scalrg ldom-rg
phys-schost-1# /usr/cluster/bin/clrg online -eM ldom-rg
```

Prepare the domain configurations:

Set the failure policy for the primary domain on both nodes. The master domain's failure policy is controlled by setting the failure-policy property. It must be set to reset.
```
# ldm set-domain failure-policy=reset primary

# ldm list -o domain primary
```
Create the virtual services on both of the cluster nodes. The virtual service names have to be exactly the same on both of the cluster nodes that will later be added to the guest domain configuration.
```
# ldm add-vds primary-vds0 primary

# ldm add-vconscon port-range=5000-5100 primary-vcc0 primary
# ldm add-vsw net-dev=net0 primary-vsw0 primary
# ldm add-vdsdev <bootdisk-path> ldg1-boot@primary-vds0
```
The boot disk path depends on whether the boot disk is a raw disk or a file-backed virtual disk on global file system or network file system. If it is a raw disk, it must be specified as /dev/global/dsk/dXs2. This example uses a network file system and, hence, it is a file-backed virtual disk.
```
# mkfile 20g /disks/ldg1-boot

# ldm add-vdsdev /disks/ldg1-boot ldg1- boot@primary-vds0
```
List the services to make sure they are identical on both of the cluster nodes. In the ldm list-services command output, the ldg1-boot and dvd disk services should match, because they are used by the guest domain when brought online.

Check the services on phys-schost-1.

phys-schost-1# ldm list-services primary

VCC
NAME LDOM PORT-RANGE
primary-vcc0 primary 5000-5100

VSW
NAME LDOM MAC NET-DEV ID DEVICE
LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-L
INK
primary-vsw0 primary 00:14:4f:f9:5c:1a net0 0 switch@0
1 1 1500 on

VDS
NAME LDOM VOLUME OPTIONS MPGROUP
DEVICE
primary-vds0 primary ldg1-boot /disks/ldg1-boot
dvd /var/tmp/sol-11_1-20-text-sparc.iso
phys-schost-1#

Check the services on phys-schost-2.

phys-schost-2# ldm list-services primary

VCC
NAME LDOM PORT-RANGE
primary-vcc0 primary 5000-5100

VSW
NAME LDOM MAC NET-DEV ID DEVICE
LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-L
INK
primary-vsw0 primary 00:14:4f:fb:02:5c net0 0 switch@0
1 1 1500 on

VDS
NAME LDOM VOLUME OPTIONS MPGROUP
DEVICE
primary-vds0 primary ldg1-boot /disks/ldg1-boot
dvd /var/tmp/sol-11_1-20-text-sparc.iso
phys-schost-2#

Create the logical domain on only one node. The guest LDom in the failover configuration must be configured only on one node, and when the HA LDom resource is created, the configuration is stored in the CCR. When the LDom resource comes online, it creates the LDom on the node where it comes online. The HA LDom resource starts and boots the LDom.
```
phys-schost-1# ldm add-domain ldg1

phys-schost-1# ldm set-vcpu 32
phys-schost-1# ldm set-mem 8g
phys-schost-1# ldm add-vdisk ldg1-boot@primary-vds0 ldg1
phys-schost-1# ldm add-vdisk dvd@primary-vds0 ldg1
```
If there is a mix of architectures in the cluster setup, change cpu-arch to generic for the guest domain.

phys-schost-1# ldm set-domain cpu-arch=generic ldg1

The guest domain ldg1 should be installed before placing the domain under Oracle Solaris Cluster control.
```
phys-schost-1# ldm bind ldg1

phys-schost-1# ldm boot ldg1
```
Enable the vntsd service if it is not already online. Then connect to the console and boot through the DVD.
```
# svcadm enable vntsd

# telnet 0 5000
# ok boot dvd
```

Perform the installation procedure to install Oracle Solaris in the guest domain ldg1.

phys-schost-2# ldm ls -l ldg1

NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
ldg1             active     -n----  5000    32    8G       0.0%  4d 17h 17m

SOFTSTATE
Solaris running

UUID
    9fbee96f-3896-c224-e384-cb24ed9650e1
MAC
    00:14:4f:fb:4d:49

HOSTID
    0x84fb4d49

CONTROL
    failure-policy=ignore
    extended-mapin-space=off
    cpu-arch=generic

DEPENDENCY
    master=primary

CORE
    CID    CPUSET
    4      (32, 33, 34, 35, 36, 37, 38, 39)
    5      (40, 41, 42, 43, 44, 45, 46, 47)
    6      (48, 49, 50, 51, 52, 53, 54, 55)
    7      (56, 57, 58, 59, 60, 61, 62, 63)

VCPU
    VID    PID    CID    UTIL STRAND
    0      32     4      0.3%   100%
    1      33     4      0.0%   100%
    2      34     4      0.0%   100%
    3      35     4      0.0%   100%
    4      36     4      0.0%   100%
    5      37     4      0.0%   100%
    6      38     4      0.0%   100%
    7      39     4      0.0%   100%
    8      40     5      0.0%   100%
    9      41     5      1.2%   100%
    10     42     5      0.0%   100%
    11     43     5      0.0%   100%
    12     44     5      0.0%   100%
    13     45     5      0.0%   100%
    14     46     5      0.1%   100%
    15     47     5      0.0%   100%
    16     48     6      0.0%   100%
    17     49     6      0.0%   100%
    18     50     6      0.0%   100%
    19     51     6      0.0%   100%
    20     52     6      0.0%   100%
    21     53     6      0.0%   100%
    22     54     6      0.0%   100%
    23     55     6      0.0%   100%
    24     56     7      0.0%   100%
    25     57     7      0.0%   100%
    26     58     7      0.0%   100%
    27     59     7      0.0%   100%
    28     60     7      0.0%   100%
    29     61     7      0.0%   100%
    30     62     7      0.0%   100%
    31     63     7      0.0%   100%

MEMORY
    RA               PA               SIZE
    0x10000000       0x200000000      256M
    0x400000000      0x220000000      7680M
    0x800000000      0x840000000      256M

CONSTRAINT
    threading=max-throughput

VARIABLES
    auto-boot?=false

NETWORK
    NAME             SERVICE                     ID   DEVICE     MAC
   MODE   PVID VID                  MTU   LINKPROP
    vnet0            primary-vsw0@primary        0    network@0  00:14:4f:fa:31:
6c        1                         1500
DISK
    NAME             VOLUME                      TOUT ID   DEVICE  SERVER
  MPGROUP
    bootdisk         ldg1-boot@primary-vds0           0    disk@0  primary

    dvd              dvd@primary-vds0                 1    disk@1  primary


VCONS
    NAME             SERVICE                     PORT   LOGGING
    ldg1             primary-vcc0@primary        5000   on

phys-schost-2#

phys-schost-2# ls -ld /var/tmp/passwd
-r--------   1 root     root           7 Jul 26 13:36 /var/tmp/passwd

Set the master property for the guest domain. The master property must be set to primary, so that if the primary node panics or reboots, the guest LDom will be rebooted. Each slave domain can specify up to four master domains by setting the master property.
```
phys-schost-1# ldm set-domain master=primary ldg1

phys-schost-1# ldm list -o domain ldg1
```
Each master domain can specify what happens to its slave domains in the event that the master domain fails. For instance, if a master domain fails, it might require its slave domains to panic. If a slave domain has more than one master domain, the first master domain to fail triggers its defined failure policy on all of its slave domains.

Place the guest domain under Oracle Solaris Cluster control:

Create the encrypted password file on both of the cluster nodes.

phys-schost-1# dd if=/dev/urandom of=/var/cluster/ldom_key bs=16 count=1

phys-schost-1# chmod 400 /var/cluster/ldom_key
phys-schost-1# echo <root-password> | /usr/sfw/bin/openssl enc -aes128 \
-e -pass file:/var/cluster/ldom_key -out /opt/SUNWscxvm/.ldg1_passwd
phys-schost-1# chmod 400 /opt/SUNWscxvm/.ldg1_passwd
phys-schost-1# echo "encrypted" > /disks/passwd

Verify that the encrypted password can be decrypted.

phys-schost-1# /usr/sfw/bin/openssl enc -aes128 -d -pass \

file:/var/cluster/ldom_key -in /opt/SUNWscxvm/.ldg1_passwd

Place the ldg1 domain under the control of the data service.

phys-schost-1# /usr/cluster/bin/clrs create -g ldom-rg -t SUNW.ldom \

-p Domain_name=ldg1 -p Password_file=/disks/passwd \
-p Plugin_probe="/opt/SUNWscxvm/bin/ppkssh \
-P user1:/home/user1/.ssh/id_dsa:ldg1:multi-user-server:online" \
-p resource_dependencies_offline_restart=ldom-scalrs ldom-rs

Run the following commands to validate the ssh setup on both nodes. Below, ldg1 is the host name of the domain ldg1.

phys-schost-1# ssh -i /home/user1/.ssh/id_dsa -l user1 ldg1 svcs -H -o state multi-user-server:default


phys-schost-2# ssh -i /home/user1/.ssh/id_dsa -l user1 ldg1 svcs -H -o state multi-user-server:default

Check the status of the resources and resource groups:

phys-schost-2# /usr/cluster/bin/clrg status ldom-rg


=== Cluster Resource Groups ===

Group Name       Node Name       Suspended      Status
-------------    ---------         -----        ---------
ldom-rg          phys-schost-1     No           Offline
                 phys-schost-2     No           Online

phys-schost-2# /usr/cluster/bin/clrs status ldom-rs

=== Cluster Resources ===

Resource Name       Node Name      State        Status Message
-------------       ---------      -----     --------------
ldom-rs             phys-schost-1  Offline      Offline - Successfully                                                stopped ldg1
                    phys-schost-2  Online       Online - ldg1 is active
                                                (normal)

Verify Guest Domain Failover

Connect to the guest domain over the network using ssh, rsh, or telnet per the configuration, and then run the w command to check uptime and verify that the guest domain did not reboot but was "live migrated":

# ssh -l username <host-name-of-failover-guest-domain> w
Switch the resource group to the other node to make sure the services are configured correctly on both nodes. The LDom agent performs live migration, which ensures that the ssh, rsh, or telnet connection survives the switchover.

phys-schost-1# clrg switch -n phys-schost-2 ldom-rg
Switch the resource group back to the primary node.

phys-schost-2# clrg switch -n phys-schost-1 ldom-rg
Run w on the ssh session to the failover domain's host name to verify that the guest domain is alive.
Now that the guest domain has been configured for HA using the HA agent, the domain can be installed with the Oracle Solaris Cluster bits and configured as a single-node cluster, as described in the next section.

To prepare for the installation, log in to the domain console. You can use the ldm command shown below to determine the port number for the domain console (5001, in this example). It is better to log in from an ssh session to phys-schost-2 than from the console of phys-schost-2.
```
root@phys-schost-2# ldm ls ldg1

NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
ldg1             active     -n----  5001    8     8G       0.1%  0.1%  18h 49m
root@phys-schost-2 # telnet 0 5001
```

Install, Configure, and Verify the Single-Node Cluster

Install the cluster.

root@ldg1-hostname:~# pkg set-publisher -k /var/pkg/ssl/Oracle_Solaris_Cluster_4.key.pem \

-c /var/pkg/ssl/Oracle_Solaris_Cluster_4.certificate.pem \
-O https://pkg.oracle.com/ha-cluster/release ha-cluster

root@ldg1-hostname:~# pkg publisher
PUBLISHER                   TYPE     STATUS P LOCATION
solaris                     origin   online F http://pkg.oracle.com/solaris/release/
ha-cluster                  origin   online F https://pkg.oracle.com/ha-cluster/release/
root@ldg1-hostname:~#

root@ldg1-hostname:~# pkg install --accept ha-cluster-full
           Packages to install: 124
           Mediators to change:   1
            Services to change:  10
       Create boot environment:  No
Create backup boot environment: Yes
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                            124/124   17794/17794  438.3/438.3  674k/s

PHASE                                          ITEMS
Installing new actions                   23248/23248
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Updating package cache                           2/2

root@ldg1-hostname:~# /usr/cluster/bin/scinstall -iFo -C testcluster

Initializing cluster name to "testcluster" ... done
Initializing authentication options ... done
/usr/cluster/bin/scinstall[13]: test: argument expected


Setting the node ID for "ldg1-hostname" ... done (id=1)

Updating nsswitch.conf ... done2

Adding cluster node entries to /etc/inet/hosts ... done


Configuring IP multipathing groups ...done2

Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done

Ensure network routing is disabled ... done
Network routing has been disabled on this node by creating /etc/notrouter.
Having a cluster node act as a router is not supported by Oracle Solaris Cluster.
Please do not re-enable network routing.
Please reboot this machine.


Log file - /var/cluster/logs/install/scinstall.log.945

root@ldg1-hostname:~# reboot

root@ldg1-hostname:~# /usr/cluster/bin/clnode status


=== Cluster Nodes ===

--- Node Status ---

Node Name                               Status
---------                                       ------
ldg1-hostname                           Online

This failover LDom can be used as any other physical single-node cluster to host resource groups and resources. Add the host name and IP address mapping for lh-hostname to the /etc/hosts file before creating the LogicalHostname resource.

root@idg1-hostname:~# /usr/cluster/bin/clrg create lh-rg

root@idg1-hostname:~# /usr/cluster/bin/clrslh create -g lh-rg lh-hostname
root@idg1-hostname:~# /usr/cluster/bin/clrg online -emM +
root@idg1-hostname:~# /usr/cluster/bin/clrg status       

=== Cluster Resource Groups ===

Group Name       Node Name        Suspended      Status
----------       ---------        ---------      ------
lh-rg            ldg1-hostname    No             Online

Summary

This article described how to configure a failover LDom guest domain using a two-node cluster with a network file system. It explained how to verify that the cluster is behaving correctly by switching over the failover guest domain from the primary node to the secondary node and vice versa. It also described how to configure the guest domain as single-node cluster.

About the Author

Venkat Chennuru has been working as quality lead in the Oracle Solaris Cluster group for the last 14 years.

Revision 1.0, 11/25/2014