Fencing failing with No fence device
This document (000021466) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 15
Situation
- Site A : Node1 + SBD1
- Site B : Node2 + SBD2
- Site C : SBD3
The log contains messages similar to below:
May 07 14:24:35 aaha02 sbd[9252]: warning: open_device: Opening device /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd failed. May 07 14:24:35 aaha02 external/sbd(stonith-sbd)[9455]: ERROR: sbd list failed: == disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd unreadable! May 07 14:24:36 aaha02 stonith[9091]: external_status: 'sbd status' failed with rc 1 May 07 14:24:36 aaha02 stonith[9091]: external/sbd device not accessible. May 07 14:24:36 aaha02 pacemaker-fenced[5887]: warning: fence_legacy[9058] stderr: [ == disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd unreadable! ] May 07 14:24:36 aaha02 pacemaker-fenced[5887]: warning: fence_legacy[9058] stderr: [ ==Header on disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd NOT dumped ] May 07 14:24:36 aaha02 pacemaker-fenced[5887]: warning: fence_legacy[9058] stderr: [ sbd failed; please check the logs. ] May 07 14:24:36 aaha02 pacemaker-fenced[5887]: warning: fence_legacy[9058] stderr: [ logd is not running ] May 07 14:24:36 aaha02 pacemaker-fenced[5887]: notice: Couldn't find anyone to fence (reboot) aaha01 using any device May 07 14:24:36 aaha02 pacemaker-fenced[5887]: error: Operation 'reboot' targeting aaha01 by unknown node for pacemaker-controld.5891@aaha02: Error occurred (No fence device) May 07 14:24:36 aaha02 pacemaker-controld[5891]: warning: Fence operation 3 for aaha01 failed: No fence device (aborting transition and giving up for now) May 07 14:24:36 aaha02 pacemaker-controld[5891]: notice: Transition 3 aborted: Stonith failed May 07 14:24:36 aaha02 pacemaker-controld[5891]: notice: Peer aaha01 was not terminated (reboot) by the cluster on behalf of pacemaker-controld.5891@aaha02: No fence device
Resolution
In case SBD devices coming from iSCSI, the stonith-timeout calculation need to be larger than:
SCSI timeout + sbd msgwait + pcmk_delay_max + 20% wiggle room
Cause
For example if the SBD devices come from iSCSI storage, the default timeout is 120 Seconds.
The 120 Seconds are a result of 15 seconds login timeout and 8 tries.
node.conn[0].timeo.login_timeout = 15 node.session.initial_login_retry_max = 8
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021466
- Creation Date: 13-Jun-2024
- Modified Date:21-Jun-2024
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com