There is an update to the HPC software stack, bringing xCAT to 2.14.3.lenovo3, based on 2.14.3, and confluent is brought to 2.1.0.

Here are some of the highlighted changes:

Greatly improved collective recovery behavior

The collective functionality in 2.0 was unable to automatically continue running well under various circumstances and would require manual intervention. 2.1 greatly improves this to automatically recover more reliably and quickly.

GUI now provides auto completion in the noderange field of the overview page

When typing into noderange, it will now offer completion options:

gui completion

nodeconfig can now provide access to advanced UEFI settings

nodeconfig has a new option, -a to work with advanced UEFI configuration settings.

nodeconfig can now request restoring UEFI to factory default configuration

nodeconfig has a new option, -r uefi to reset the node to UEFI factory default settings

nodeconfig and nodeattrib commands can now accept wildcard expressinos

Commands can now target attribute or settings using wildcards. For example to use this to configure all Mellanox ConnectX5 adapters as Ethernet:

$ nodeconfig d3,d4 MellanoxNetworkAdapter-*.NetworkLinkType=Ethernet
$ nodeconfig d3,d4 MellanoxNetworkAdapter-*.NetworkLinkType
d3: MellanoxNetworkAdapter-0600.NetworkLinkType: Ethernet
d3: MellanoxNetworkAdapter-0601.NetworkLinkType: Ethernet
d4: MellanoxNetworkAdapter-0600.NetworkLinkType: Ethernet
d4: MellanoxNetworkAdapter-0601.NetworkLinkType: Ethernet

nodegroupattrib and nodeattrib can now prompt for attribute values interactively

This can be used to enter credentials interactively rather than passing them on command line or via environment variable. For example to globally declare a particular user and password.

$ nodegroupattrib everything -p bmcuser bmcpass
Enter value for bmcuser: 
Confirm value for bmcuser: 
Enter value for bmcpass: 
Confirm value for bmcpass: 
everything: secret.hardwaremanagementpassword: ********
everything: secret.hardwaremanagementuser: ********

Discovery process can now apply relaxed password policy rules if desired

If the default policies on the XCC are not desireable, they may be customised as part of discovery. For example to disable password expiration and the account lockout on too many incorrect attempts:

$ nodegroupattrib everything discovery.passwordrules=expiration=no,loginfailures=no

nodeinventory has had significant improvements

The structure of nodeinventory output has greatly improved for tracking multiple of the same part in a system. Additionally it will try to resolve genericly named adapters to more descriptive names:

 # nodeinventory n771 |grep GPU
 n771: GV100GL [Tesla V100 PCIe 32GB] Type: GPU
 n771: GV100GL [Tesla V100 PCIe 32GB] 2 Type: GPU
 n771: GV100GL [Tesla V100 PCIe 32GB] 3 Type: GPU
 n771: GV100GL [Tesla V100 PCIe 32GB] 4 Type: GPU
 # nodeinventory d5 |grep Fabric
 d5: Omni-Path HFI Silicon 100 Series [discrete] Type: Fabric Controller

Command line tab completion for confluent commands for bash users

Most confluent commands have been augmented with appropriate bash completion capability, expanding suggested noderange completions as well as arguments following the noderange.

collate can now create per-node output files

In addition to grouping data, collate can now organize node: data output into distinct files:

$ nodeeventlog d1-d8 clear | collate -l {node}.log
$ ls
d1.log  d2.log  d3.log  d4.log  d5.log  d6.log  d7.log  d8.log

Add interface for retrieving service data

See the documentation for the nodesupport command.

Improve consistency of the /networking/macs API

The /networking/macs API would clear during a rescan. Now it presents the previous data until the scan completes.

The /networking/macs/rescan and /discovery/rescan APIs can now be read to show current scanning state

Developers wanting to track the completion of scan activities may now do so via the same resources that are used to initiate rescans.

Various bug fixes

For a complete list of changes, see our git revisions