Pages

Tuesday, January 7, 2020

OCFS2 (Oracle Cluster File System) #2: Add / Remove node from cluster

 В данной заметке описан процесс добавления/удаления сервера из конфигурации ocfs2 кластера.


ADD NODE

==================================================
1. Preparation checks:
==================================================

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

---> Check cluster status

$ /etc/init.d/o2cb status
++++++++++
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "testCluster": Online
  Heartbeat dead threshold: 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
  Heartbeat mode: Local
Checking O2CB heartbeat: Active
Debug file system at /sys/kernel/debug: mounted
++++++++++

---> Get the ocfs2 volume partitions

$ mounted.ocfs2 -d
++++++++++
Device     Stack  Cluster  F  UUID                              Label
/dev/sdb1  o2cb               4DFB615236DC45DD8E52B2395DC7C110
++++++++++

---> Check "Max Node Slots" of the current ocfs2 volume

Note: The number of max node slots specifies the number of nodes that can concurrently mount the volume. This number is specified during format and can be increased using tunefs.ocfs2.

$ echo 'stats -h' | debugfs.ocfs2 /dev/sdb1
++++++++++
debugfs.ocfs2 1.8.6
debugfs: stats -h
        Revision: 0.90
        Mount Count: 0   Max Mount Count: 20
        State: 0   Errors: 0
        Check Interval: 0   Last Check: Sun Jan  5 23:52:21 2020
        Creator OS: 0
        Feature Compat: 3 backup-super strict-journal-super
        Feature Incompat: 14160 sparse extended-slotmap inline-data xattr indexed-dirs refcount discontig-bg
        Tunefs Incomplete: 0
        Feature RO compat: 1 unwritten
        Root Blknum: 5   System Dir Blknum: 6
        First Cluster Group Blknum: 3
        Block Size Bits: 12   Cluster Size Bits: 12
        Max Node Slots: 4 <==== Okay
        Extended Attributes Inline Size: 256
        Label:
        UUID: 4DFB615236DC45DD8E52B2395DC7C110
        Hash: 3754404690 (0xdfc7ab52)
        DX Seeds: 2330031485 1113073678 3429088798 (0x8ae1757d 0x4258280e 0xcc63be1e)
        Cluster stack: classic o2cb
        Cluster flags: 0
debugfs:
++++++++++

В случае необходимости корректировки данного параметра использовать документ: How to Change and Check Number of Slots for OCFS2 (Doc ID 602861.1)

---> Check /etc/ocfs2/cluster.conf on the online nodes

Note: Make sure that only the online nodes are in the configuration file /etc/ocfs2/cluster.conf.
Please notice that the new added node "srv-ocfs2-node3" should not be in the configuration file /etc/ocfs2/cluster.conf .
This is because the o2cb_ctl utility will add, but does not check if it is already present.

$ cat /etc/ocfs2/cluster.conf
++++++++++
cluster:
        node_count = 2
        name = testCluster

node:
        ip_port = 7777
        ip_address = 192.168.197.165
        number = 1
        name = srv-ocfs2-node1.oracle.com
        cluster = testCluster

node:
        ip_port = 7777
        ip_address = 192.168.197.166
        number = 2
        name = srv-ocfs2-node2.oracle.com
        cluster = testCluster
++++++++++

---> Check the in-memory filesystems configfs

Note: configfs is mounted at /sys/kernel/config, instead of /config on EL5

$ ls -l /sys/kernel/config/cluster/*/node
++++++++++
drwxr-xr-x. 2 root root 0 Jan  7 13:14 srv-ocfs2-node1.oracle.com
drwxr-xr-x. 2 root root 0 Jan  7 13:14 srv-ocfs2-node2.oracle.com
++++++++++

===========================================
2. Prepare new node:
===========================================

>>> root@srv-ocfs2-node3

--> OCFS2 packages check/install

$ yum install -y ocfs2*
$ yum list installed | grep -i ocfs2
++++++++++
ocfs2-tools.x86_64                 1.8.6-11.el6               @public_ol6_latest
ocfs2-tools-devel.x86_64           1.8.6-11.el6               @public_ol6_latest
++++++++++

$ chkconfig ocfs2 on

---> Storage steps

!!!
!!! Поскольку тестирую на VM Ware Workstation 11.1 нужно сделать корректировки в конфиг vmx файл вирт. машины, чтобы была имитация расшаренного устройства.
!!!

File: D:\VM\srv-ocfs2-node3\srv-ocfs2-node3.vmx
  
Добавить в конец файла:
++++++++++
disk.locking = "FALSE"
diskLib.dataCacheMaxSize = "0"
diskLib.dataCacheMaxReadAheadSize = "0" 
diskLib.dataCacheMinReadAheadSize = "0" 
diskLib.dataCachePageSize = "4096" 
diskLib.maxUnsyncedWrites = "0" 
scsi1.sharedBus = "virtual"
++++++++++

Добавляю диск на уровне VM Ware к вирт. машине, как УЖЕ СУЩЕСТВУЮЩИЙ.

После этого сканируем порты, чтобы обнаружить новый девайс на каждой вирт. машине:
$ {
echo "- - -" > /sys/class/scsi_host/host0/scan
echo "- - -" > /sys/class/scsi_host/host1/scan
echo "- - -" > /sys/class/scsi_host/host2/scan
}

$ mkdir -p /u01
$ fdisk /dev/sdb
p
w
$ lsblk

---> Create OCFS2 directory

$ mkdir -p /etc/ocfs2/

======================================================================
3. Add the new node to the online ocfs2 "testCluster" cluster:
======================================================================

---> Run command o2cb_ctl to add the node rac3 to the online ocfs2 cluster

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

$ o2cb_ctl -C -i -n srv-ocfs2-node3.oracle.com -t node -a number=3 -a ip_address=192.168.197.167 -a ip_port=7777 -a cluster=testCluster
++++++++++
Node srv-ocfs2-node3 created
++++++++++

---> Check the in-memory filesystems configfs to see if  node rac3 has been added

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

$ ls -l /sys/kernel/config/cluster/*/node
++++++++++
drwxr-xr-x. 2 root root 0 Jan  7 13:13 srv-ocfs2-node1.oracle.com
drwxr-xr-x. 2 root root 0 Jan  7 13:04 srv-ocfs2-node2.oracle.com
drwxr-xr-x. 2 root root 0 Jan  7 13:20 srv-ocfs2-node3
++++++++++

---> Copy config files to new node

>>> root@srv-ocfs2-node1

$ scp /etc/ocfs2/cluster.conf root@srv-ocfs2-node3:/root/cluster.conf
$ scp /etc/sysconfig/o2cb root@srv-ocfs2-node3:/root/o2cb

>>> root@srv-ocfs2-node3

$ cp /etc/ocfs2/cluster.conf /etc/ocfs2/cluster.conf.back
$ cp /root/cluster.conf /etc/ocfs2/cluster.conf
$ cp /etc/sysconfig/o2cb /etc/sysconfig/o2cb.conf.back
$ cp /root/o2cb /etc/sysconfig/o2cb

---> Start o2cb service on new node

>>> root@srv-ocfs2-node3

$ /etc/init.d/o2cb start
++++++++++
checking debugfs...
Loading stack plugin "o2cb": OK
Loading filesystem "ocfs2_dlmfs": OK
Creating directory '/dlm': OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Setting cluster stack "o2cb": OK
Registering O2CB cluster "testCluster": OK
Setting O2CB cluster timeouts : OK
++++++++++

$ /etc/init.d/o2cb status
++++++++++
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster "testCluster": Online
  Heartbeat dead threshold: 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
  Heartbeat mode: Local
Checking O2CB heartbeat: Not active
Debug file system at /sys/kernel/debug: mounted
++++++++++

$ lsmod | grep -i ocfs2
++++++++++
ocfs2_dlmfs            28672  1
ocfs2_stack_o2cb       16384  0
ocfs2_dlm             241664  1 ocfs2_stack_o2cb
ocfs2_nodemanager     245760  11 ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue        20480  2 ocfs2_dlmfs,ocfs2_stack_o2cb
configfs               36864  3 ocfs2_nodemanager,target_core_mod
++++++++++

---> Mount and correct /etc/fstab

>>> root@srv-ocfs2-node3

$ mkdir -p /u01
$ mount -t ocfs2 /dev/sdb1 /u01
$ cat /etc/fstab
++++++++++
/dev/sdb1 /u01 ocfs2 _netdev,defaults 0 0
++++++++++


REMOVE NODE:

==================================================
0. Try to remove online:
==================================================

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

$ o2cb_ctl -D -n srv-ocfs2-node3 -u
++++++++++
o2cb_ctl: Not yet supported
++++++++++

Note: it is impossible to remove an ocfs2 nodes from an on-line ocfs2 cluster with current version of ocfs2 and o2cb_ctl included in ocfs2-tools.

======================================================================
1. Stop all applications which use OCFS2 volume on all ocfs2 nodes:
======================================================================


==================================================
2. Correct /etc/ocfs2/cluster.conf
==================================================

Delete the description of the removing node from /etc/ocfs2/cluster.conf of all ocfs2 nodes:

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

$ cat /etc/ocfs2/cluster.conf
++++++++++
cluster:
        heartbeat_mode = local
        node_count = 2      <============ Don't forget to correct it too
        name = testCluster

node:
        number = 1
        cluster = testCluster
        ip_port = 7777
        ip_address = 192.168.197.165
        name = srv-ocfs2-node1.oracle.com

node:
        number = 2
        cluster = testCluster
        ip_port = 7777
        ip_address = 192.168.197.166
        name = srv-ocfs2-node2.oracle.com
++++++++++

Note: You also need to configure "number =" for other nodes, and "node_count =" in cluster section appropriately.  /etc/ocfs2/cluster.conf on all ocfs2 nodes must be same, no differences are allowed.

=====================================================
3. Umount ocfs2 device and stop o2cb on ALL NODES:
=====================================================

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2 / srv-ocfs2-node3

$ umount /u01
$ /etc/init.d/o2cb offline testCluster
++++++++++
Clean userdlm domains: OK
Stopping O2CB cluster testCluster: Unregistering O2CB cluster "testCluster": OK
Unloading module "ocfs2": OK
++++++++++

$ /etc/init.d/o2cb unload
++++++++++
Clean userdlm domains: OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unloading module "ocfs2_stack_o2cb": OK
/etc/init.d/o2cb: line 1151: read: read error: 0: No such device
++++++++++

===================================================================
4. Start o2cb and mount ocfs2 on all nodes EXCEPT REMOVING NODE:
===================================================================

>>> root@srv-ocfs2-node1 / srv-ocfs2-node2

$ /etc/init.d/o2cb load
++++++++++
checking debugfs...
Loading stack plugin "o2cb": OK
Loading filesystem "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
++++++++++

$ /etc/init.d/o2cb online testCluster
++++++++++
checking debugfs...
Setting cluster stack "o2cb": OK
Registering O2CB cluster "testCluster": OK
Setting O2CB cluster timeouts : OK
++++++++++

$ mount /u01

$ ls -l /sys/kernel/config/cluster/*/node
++++++++++
drwxr-xr-x. 2 root root 0 Jan  8 18:16 srv-ocfs2-node1.oracle.com
drwxr-xr-x. 2 root root 0 Jan  8 18:16 srv-ocfs2-node2.oracle.com
++++++++++

======================================================================
* Sources:
======================================================================

How to Add a New OCFS2 Node to an Online Cluster (Doc ID 761020.1)
How to Change and Check Number of Slots for OCFS2 (Doc ID 602861.1)
How to remove an ocfs2 node from a cluster (Doc ID 852002.1)