Confluent configuration and troubleshooting notes for Lenovo hardware

SN2010 rack view

The SN2010* Ethernet switch is half-wide and 1U tall, but also doesn’t install into an enclosure. Due to this, it doesn’t fit into the normal way of setting up location information for the confluent nodeattributes for it so that it would be displayed in the rack view in the confluent GUI. In order to get it to be displayed properly in the confluent GUI rack view (when there are two installed side-by-side in the same rack unit), a dummy enclosure node has to be setup, and the two switches in that rack unit have to have their enclosure node attributes set to that enclosure. The enclosure.bay nodeattribute should be set to 1 for the switch installed to the left (as seen from the front of the rack) and to 2 for the switch installed in the right. The following is an example of the confluent nodeattributes to set for this:

mn10:/opt/exerciser # nodeattrib switch90
switch90: discovery.policy: open
switch90: dns.domain: cluster1.e1350
switch90: enclosure.bay: 1
switch90: enclosure.manager: switch9091
switch90: groups: everything

mn10:/opt/exerciser # nodeattrib switch91
switch91: discovery.policy: open
switch91: dns.domain: cluster1.e1350
switch91: enclosure.bay: 2
switch91: enclosure.manager: switch9091
switch91: groups: everything

mn10:/opt/exerciser # nodeattrib switch9091
switch9091: discovery.policy: open
switch9091: dns.domain: cluster1.e1350
switch9091: groups: everything
switch9091: location.height: 1
switch9091: location.rack: CR021.2
switch9091: location.row: 2
switch9091: location.u: 41

And here is the result of this in the confluent rack view:

Confluent rack view

nodeconfig may take a few seconds to reflect submitted changes

When using nodeconfig to submit a system configuration change, it exits when the target device has accepted the change. However, an endpoint may take some time to activate the change so that it will be visible when showing the configuration.

IPv6 and Domain name resolution may not work for Redfish-managed nodes

For SR655 nodes configured to be managed with Redfish, IPv6 and domain name resolution may not work for node attribute values. For example, it is known to not work in the case of the hardwaremanagement.manager attribute. In the example below:

# nodeattrib node1 hardwaremanagement.method
node1:  hardwaremanagement.method:  redfish

# nodeattrib node1 hardwaremanagement.manager
node1:  hardwaremanagement.manager: node1-mgt

Or:

# nodeattrib node1 hardwaremanagement.manager
node1:  hardwaremanagement.manager: fe80::3ee1:a1ff:fec7:e627%eno1

The domain name node1-mgt1 will trigger an error on the target TSM, causing the following error:

# nodefirmware node1

Traceback (most recent call last):
  File "/opt/confluent/bin/nodefirmware", line 166, in <module>
    show_firmware(session)
  File "/opt/confluent/bin/nodefirmware", line 152, in show_firmware
    exitcode |= client.printerror(res['databynode'][node], node)
  File "/opt/confluent/lib/python/confluent/client.py", line 116, in printerror
    sys.stderr.write('{0}: {1}\n'.format(node, res['error']))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 175-254: ordinal not in range(128)

Or IPv6 will trigger this error:

 # nodefirmware node1
 node1: Unexpected Error: /redfish/v1/Managers/Self:<pre style="font-size:12px; font-family:monospace; color:#8B0000;">[web.lua] Error in RequestHandler, thread: 0x758476e8 is dead.

There are two options to address the problem:

Use a literal IP address for the node attribute in question:

    nodeattrib node1 hardwaremanagement.manager=10.19.67.83
    node1:  10.19.67.13

Switch to IPMI management:

    nodeattrib node1 hardwaremanagement.method=ipmi
    node1:  ipmi

Redfish management method not supported on FPC, SMM or SMM2

The Lenovo FPC, SMM and SMM2 chassis management modules do not support redfish–use ipmi for their hardwaremanagement methods.

xCAT genesis does not include support for Intel E810 or Mellanox ConnectX-6 Lx Ethernet adapters

xCAT genesis does not currently include support for Intel E810 or Mellanox ConnectX-6 Lx Ethernet adapters–confluent genesis should be used instead.

Occasional privilege errors from confluent commands

Occasionally confluent out-of-band management commands may report “Error: Insufficient privilege level or firmware firewall”. If this occurs, the hardwaremanagement.manager node attribute on the node that the problem occurred on to something other than the actual IP or hostname of the hardwaremanagement manager for that node, and then set it back to the previous (correct) value. After that the problem should no longer occur.

XCC web interfaces opened from the confluent web GUI may spontaneously log out when entering the firwmare update page

When opening the XCC web interface for more than one XCC from the confluent GUI web interface, so that the URLs for those XCC web interfaces are forwarded ports with the same IP (the IP of the confluent server), when entering the firmware update page on the XCC web interface, the XCC web interface on one or more of the XCC web interfaces opened this way may spontaneously log out. To work around this issue, either open just one XCC web interface through the confluent GUI or setup a socks proxy to access the XCC web interfaces (or access them directly if the network configuration allows for this) so that each XCC web interface is accessed through a separate IP address.

Lenovo ThinkSystem generation “V3” systems may spontaneously reboot when using the nodefirmware command for certain firmware updates

Certain firmware update packages for Lenovo V3 systems are performed in a system built-in Linux environment that the system has to reboot into to do the update, even though the update is initiated out-of-band. In these cases the node will spontaneously and without warning reboot into that environment. To avoid improper shutdown of a running OS on the system, the system can be turned off before running the nodefirmware command for these types of updates.

XCC3/V4 Systems

When using Lenovo Confluent for managing Lenovo Version 4 (V4) systems that have the XClarity Controller 3 (XCC3) version of the baseboard management controller (BMC), it is recommended that you set the hardwaremanagement.method node attribute to redfish. Not all functionality is implemented using IPMI for V4 systems, hence using V4 systems with hardwaremanagement method set to IPMI might result in failures or incomplete data.

SR670 V2

When using the Lenovo Confluent nodeinventory command to read the inventory data on an SR670 v2, it is recommended that you set the hardwaremanagement.method node attribute to ipmi.

Http boot will fail for profile names with 84 or more characters

When attempting to http boot and the profile name is too long the attempt will not work. When using IPv6 addresses there is no limit to the number of characters in the profile name, whereas for IPv4 addresses the profile name length is limited to 84 characters.