There is an update to the HPC software stack, bringing xCAT to 2.14.6.lenovo1, based on 2.14.6, and confluent is brought to 2.3.0.

Here are some of the highlighted changes:

redfish plugin

2.3.0 contains the initial release of the redfish plugin. This provides more capability on an industry standard system. Note that ipmi remains the default and generally recommended plugin to use with Lenovo systems (on Lenovo systems, the ipmi plugin will use redfish or other protocols as appropriate already). The following scenarios would be reasons to use redfish plugin on Lenovo systems

  • Wanting to use an authentication method such as LDAP, which is incompatible with ipmi
  • Requirement not to use UDP port 623
  • Faster nodeconfig access to select UEFI settings (not all settings are available).

At this point, redfish implementations are generally slower than their IPMI counterparts. Additionally system firmware redfish support has been maturing only recently, and as such firmware from 2019 is strongly recommended regardless of vendor.

To use, set the hardwaremanagement.method attribute to redfish.

nodeshell and noderun commands have -n argument

This will suppress the nodename portion of the output. For example, one could do an ad-hoc creation of /etc/hosts entries through a command such as:

$ noderun -n rackmount echo 172.20.{n1}.{n2} {node}.domain
172.20.8.32 r8u32.domain
172.20.8.33 r8u33.domain

New stats command

The stats command has been added to help analyze numerical output. By default it gives a few frequently useful statistics:

$ grep 2R2 node*.log | stats
Samples: 54 Min: 1680.54 Median: 1682.485 Mean: 1682.51018519 Max: 1688.49 StandardDeviation: 1.2434755515 Sum: 90855.55

It can optionally produce a textual histogram.

$ grep 2R2 node*.log | stats -t
1680.5|=======================
1681.3|==========================================================
1682.1|=========================================================================
1682.9|===================================
1683.7|============
1684.5|====
1685.3|
1686.1|
1686.9|
1687.7|====
Samples: 54 Min: 1680.54 Median: 1682.485 Mean: 1682.51018519 Max: 1688.49 StandardDeviation: 1.2434755515 Sum: 90855.55

If you have a sixel enabled terminal, a graphical chart may be requested:

sixel histogram

Confluent now supports user groups

The /usergroups/ api is now available to create groups. This is mapped by default to system group names and memberships, which may be /etc/group or LDAP groups.

To confer access to any member of the wheel group to confluent:

confetty create /usergroups/wheel

Confluent now has user roles

Confluent users and usergroups may be limited in privilege to one of the roles: Administrator, Operator, or Monitor. Administrator is the usual full access, Operator is blocked from modifying users/passwords. Monitor is severely limited to select reead-only access appropriate foruse in monitoring scripts. This may be done to users or groups.

For example, to limit wheel group to not change any passwords or users:

confetty set /usergroups/wheel role=Operator

Improved discovery for XCC and SMM

Where possible, discovery for XCC and SMM will induce fewer failed login attempts and also will execute more quickly. Additionally, systems that are configured to not have IPMI available are now able to be discovered.

Web forwarder now works if confluent gui served from custom port

The functionality to open endpoint management consoles in web interface now supports custom https ports for the main confluent gui.

Storage configuration now supports custom strip sizes and modifying disk state

nodestorage has the -s argument added to specify strip size on array creation. Additionally, it has added the setdisk subcommand to manage jbod/hotspare state.

Fix confluent environment file for zsh users

The bash completion enhancements are now only attempted if the shell is bash

nodelicense can now save and delete licenses from Lenovo XCCs

For example, to backup a noderange of license keys to one directory per node:

nodelicense rack1 save ./licenses/{node}/

Fix “login process died” errors

In certain specific scenarios, “login process died” errors would come from ipmi plugin. A class of those scenarios has been fixed and an autorecovery mechanism has been put into place to catch any remaining causes.

A node may now be more easily forced through repeating discovery

nodediscover now has a reassign subcommand. To have discovered node d1 repeat discovery procedure:

nodediscover reassign -n d1

New credentials are encrypted and validated using AES-GCM instead of AES-CBC and SHA-256

As a result, 2.3.0 attribute databases and backups cannot be used with older versions of confluent.

Improve detail in the event of “Unexpected Error” situations

Error messages now include some potentially useful information for resolving the issue.

IPMI now supports HMAC-SHA256/HMAC-SHA256-128

xCAT and confluent both try to use HMAC-SHA256 when supported now for IPMI communication.

GUI consoles now show health summary in title bar

When using many consoles, it will obscure the normal view. This brings the health status front and center to be usable in that scenario

health in console title