There is an update to the HPC software stack, bringing xCAT to 2.16.1.lenovo1, and confluent is brought to 3.1.0.

Here are some of the highlighted changes:

Collective autofailover and restricted deployment support

A new attribute collective.managercandidates can be used to specify a noderange of valid managers for a node. If defined, deployment will not be served from other collective members even if it would otherwise be possible, and a failure of a member of that range will have its managed nodes migrated to another member of that noderange.

Implement ssh.trustnodes to limit node to node ssh trust

A node may define a noderange of trusted nodes to have more limited trust. By default all members of the cluster deployed by confluent trust each other. This attribute allows this to be restricted so that, for example, storage nodes could opt not to allow compute nodes to log in.

New OS deployment profiles

Support has been added for CentOS 8.3 and RedHat 8.3, Oracle Linux 7 and 8, and CentOS Stream 8

Most stock OS images now have ‘post.d’ and ‘firstboot.d’

Scripts may be placed in post.d or firstboot.d to be automatically executed at end of install or on first boot respectively.

OS images now have a ‘fetch_remote’ function in /etc/confluent/functions

OS images can now retrieve arbitrary payloads from it’s host directory. For example:

. /etc/confluent/functions
fetch_remote infiniband/mofed.tgz

In a script will download mofed.tgz from /var/lib/confluent/public/os/(profilename)/scripts/infiniband/mofed.tgz

OS image profile.yaml now have ‘installedargs’

The current kernelargs controls how the installer is booted. installedargs can be used to control how the installed system boots separate of the installer.

configbmc will now trigger remote authentication configuration

If using in-band BMC configuration rather than remote configuration, it will now request the manager to remotely configure the username and password, to be consistent with out of band and keep the credentials withheld from the OS that is installing.

Support use of TPM2 to persist node keys across reboot

Genesis can now be rebooted without rearming the node token grant. This will facilitate a more secure stateless strategy as well, where node trust is persisted through the TPM2.

Web interface will function better when used in domain shared with other web services

Other servers in a companies domain would set domain wide cookies that could interfere with confluent web gui operation. Those invalid cookies are now discarded to allow the web interface to work.

Improved error messages on some commands

A few commands provide more specific and useful feedback on failure.

Various fixes for ESXi deployment

A number of limitations around ESXi deployment have been addressed

New memory console.logging option

If console.logging is set to memory, then the replay buffer will be maintained, but not committed to disk. This can improve performance on slow /var/log/confluent filesystems.

Better support for Cisco ethernet switches in /networking/ api

More complex use of vlans on Cisco equipment will no longer make addresses invisible to confluent

Add more attributes to discovery api for Lenovo equipment

Some enduser curated information is made available if detected

Genesis now starts ssh if booted without detected confluent server, and listens on 3389

This change allows a genesis booted through the BMC to be logged into remotely through BMC port forwarding where available. This can be used to diagnose/reconfigure networking when that would normally block remote deployment.

Performance improvements

Several areas of memory and processor usage have been optimized, particularly in handling PXE requests, discovery scanning, and nodeconfig

Various bugfixes

  • Remove cursor key mode preservation, which can conflict with firmware setup menu operation
  • Shell sessions no longer leave phantom ssh sessions
  • Fixes to automatic console behavior
  • Improved TSM discoevry as used in SR635 and SR655 servers
  • Relax expectations in nodeconfig batch files, quotes are no longer required for spaces in values
  • UUID handling is now case insensitive, working better with some script injection of id.uuid
  • Fix file descriptor leak in the web forwarder