1: This document is based on the confluent image. It is used to create an image on the cluster head node and push it to the computing node to deploy the cluster. Therefore, the header node needs to enable both httpd and Nginx, and the httpd port must be the default 443, so the https port of Nginx service must be changed to another port
2: In this scenario the login node and the management node are the same node.
echo '* soft memlock unlimited' >> /etc/security/limits.conf
echo '* hard memlock unlimited' >> /etc/security/limits.conf
reboot
Login management node
Create a new lico_env.local file according to section 2 of the LiCO installation documentation
Reload the file:
chmod 600 lico_env.local
source lico_env.local
mkdir -p ${iso_path}
download Rocky-8.6-x86_64-dvd1.iso and CHECKSUM file: https://rockylinux.org/download
Copy the file to ${iso_path}.
Validate that the verification code of the ISO file matches the code listed in CHECKSUM.
cd ${iso_path}
sha256sum Rocky-8.6-x86_64-dvd1.iso
cd ~
mkdir -p ${os_repo_dir}
mount -o loop ${iso_path}/Rocky-8.6-x86_64-dvd1.iso ${os_repo_dir}
cat << eof > ${iso_path}/EL8-OS.repo
[AppStream]
name=appstream
baseurl=file://${os_repo_dir}/AppStream/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rockyofficial
[BaseOS]
name=baseos
baseurl=file://${os_repo_dir}/BaseOS/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rockyofficial
eof
cp -a ${iso_path}/EL8-OS.repo /etc/yum.repos.d/
mkdir -p ${repo_backup_dir}
mv /etc/yum.repos.d/Rocky* ${repo_backup_dir}
dnf clean all
dnf makecache
dnf module reset nginx
dnf module enable -y nginx:1.20
Download the following package: https://hpc.lenovo.com/downloads/22b/confluent-3.5.0-2-el8.tar.xz
Upload the package to the /root directory.
Create confluent local repository:
dnf install -y bzip2 tar
mkdir -p $confluent_repo_dir
cd /root
tar -xvf confluent-3.5.0-2-el8.tar.xz -C $confluent_repo_dir
cd $confluent_repo_dir/lenovo-hpc-el8
./mklocalrepo.sh
cd ~
dnf install -y lenovo-confluent tftp-server
systemctl enable confluent --now
systemctl enable tftp.socket --now
systemctl disable firewalld --now
systemctl enable httpd --now
source /etc/profile.d/confluent_env.sh
confetty create /users/<CONFLUENT_USERNAME> password=<CONFLUENT_PASSWORD> role=admin
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
Please ensure that the BMC user name and password are consistent for every node.
nodegroupattrib everything deployment.useinsecureprotocols=firmware \
console.method=ipmi dns.servers=$dns_server dns.domain=$domain_name \
net.ipv4_gateway=$ipv4_gateway net.ipv4_method="static"
The deployment.useinsecureprotocols=firmware enables PXE support (HTTPS only mode is by default the only allowed mode), console.method=ipmi may be skipped but if specified instructs confluennt to use IPMI to access the text console to enable the nodeconsole command.
While passwords and similar may be specified the same way, it is recommended to use the -p argument to prompt for values, to keep them out of your command history. Note that if unspecified, default root password behavior is to disable password based login:
nodegroupattrib everything -p bmcuser bmcpass crypted.rootpassword
nodegroupdefine all
nodegroupdefine compute
nodedefine $sms_name
nodeattrib $sms_name net.hwaddr=$sms_mac
nodeattrib $sms_name net.ipv4_address=$sms_ip
nodeattrib $sms_name hardwaremanagement.manager=$sms_bmc
for ((i=0; i<$num_computes; i++)); do
nodedefine ${c_name[$i]};
nodeattrib ${c_name[$i]} net.hwaddr=${c_mac[$i]};
nodeattrib ${c_name[$i]} net.ipv4_address=${c_ip[$i]};
nodeattrib ${c_name[$i]} hardwaremanagement.manager=${c_bmc[$i]};
nodedefine ${c_name[$i]} groups=all,compute;
done
Set the node to boot using network pxe by default
for ((i=0; i<$num_computes; i++)); do
nodeconfig ${c_name[$i]} bootorder.bootorder=Network
done
for node_name in $(nodelist); do
noderun -n $node_name echo {net.ipv4_address} {node} {node}.{dns.domain} >> /etc/hosts
done
dnf install -y dnsmasq
systemctl enable dnsmasq --now
Users can set up requirements for operating system deployment through the initialized sub-command of the osdeploy command. The -i parameter is used to interactively prompt the options that are available:
ssh-keygen -t ed25519
chown confluent /var/lib/confluent
osdeploy initialize -i
systemctl restart sshd
osdeploy import ${iso_path}/Rocky-8.6-x86_64-dvd1.iso
If you don`t have any GPU nodes in the cluster, build a single image and dismiss any GPU related commands bellow.
imgutil build -s rocky-8.6-x86_64 /tmp/scratchdir
If both GPU and non GPU nodes are present, you will need to build two separate images.
imgutil build -s rocky-8.6-x86_64 /tmp/scratchdir
imgutil build -s rocky-8.6-x86_64 /tmp/scratchdir-gpu
dnf install -y nfs-utils
systemctl enable nfs-server --now
Enable httpd services
cat << eof > /etc/httpd/conf.d/installer.conf
Alias /install /install
<Directory /install>
AllowOverride None
Require all granted
Options +Indexes +FollowSymLinks
</Directory>
eof
systemctl restart httpd
Download the following package: https://hpc.lenovo.com/lico/downloads/7.0/Lenovo-OpenHPC-2.5.EL8.x86_64.tar
Upload the package to the /root directory.
Configure the local Lenovo OpenHPC repository:
mkdir -p $ohpc_repo_dir
cd /root
tar xvf Lenovo-OpenHPC-2.5.EL8.x86_64.tar -C $ohpc_repo_dir
rm -rf $link_ohpc_repo_dir
ln -s $ohpc_repo_dir $link_ohpc_repo_dir
$link_ohpc_repo_dir/make_repo.sh
Download the following package:: https://hpc.lenovo.com/lico/downloads/7.0/lico-dep-7.0.0.el8.x86_64.tgz
Upload the package to the /root directory.
Configure the repository for the management node:
mkdir -p $lico_dep_repo_dir
cd /root
tar -xvf lico-dep-7.0.0.el8.x86_64.tgz -C $lico_dep_repo_dir
rm -rf $link_lico_dep_repo_dir
ln -s $lico_dep_repo_dir $link_lico_dep_repo_dir
$link_lico_dep_repo_dir/mklocalrepo.sh
Obtain the LiCO 7.0.0 release package for EL8 lico-release-7.0.0.el8.tar.gz and the LiCO license file from: https://commercial.lenovo.com/cn/en/signin
Upload the release package to the management node.
Configure the local repository for the management node:
mkdir -p $lico_repo_dir
tar zxvf lico-release-7.0.0.el8.x86_64.tar.gz -C $lico_repo_dir --strip-components 1
rm -rf $link_lico_repo_dir
ln -s $lico_repo_dir $link_lico_repo_dir
$link_lico_repo_dir/mklocalrepo.sh
Install the base package:
dnf install -y lenovo-ohpc-base
Install Slurm
dnf install -y ohpc-slurm-server
Configure user shared directory The following steps describes how to create the user shared directory by taking /home as an example.
Manage node sharing /home:
echo "/home *(rw,async,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
Manage node sharing /opt/ohpc/pub for OpenHPC:
echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
exportfs -a
Install Chrony:
dnf install -y chrony
Configure Chrony as below link : https://chrony.tuxfamily.org/documentation.html
Enable chronyd
systemctl enable chronyd --now
Download slurm.conf from the following web site: https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/
Upload slurm.conf to /etc/slurm/, and modify this file as installation guide.
Download cgroup.conf from the following web site: https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/
Upload cgroup.conf to /etc/slurm.
Create /etc/slurm/gres.conf,edit the GPU resources of all nodes in the following format
NodeName=c1 Name=gpu File=/dev/nvidia[0-1]
NodeName=c2 Name=gpu File=/dev/nvidia[0-2]
Start service:
systemctl enable munge
systemctl enable slurmctld
systemctl restart munge
systemctl restart slurmctld
dnf install -y icinga2
dnf install -y nagios-plugins-ping
icinga2 api setup
icinga2 node setup --master --disable-confd
echo -e "LANG=en_US.UTF-8" >> /etc/sysconfig/icinga2
systemctl restart icinga2
Install three modules (OpenMPI, MPICH, and MVAPICH) to the system::
dnf install -y openmpi4-gnu9-ohpc mpich-ofi-gnu9-ohpc mvapich2-gnu9-ohpc ucx-ib-ohpc
Set the default module.
Set OpenMPI module as the default:
dnf install -y lmod-defaults-gnu9-openmpi4-ohpc
Set the MPICH module as the default:
dnf install -y lmod-defaults-gnu9-mpich-ofi-ohpc
Set the MVAPICH module as the default:
dnf install -y lmod-defaults-gnu9-mvapich2-ohpc
Install Singularity:
dnf install -y singularity-ohpc
Edit the file /opt/ohpc/pub/modulefiles/ohpc by adding the following content to the end of the module try-add block:
module try-add singularity
In the module del block, add the following content as the first line:
module del singularity
Run the following command:
source /etc/profile.d/lmod.sh
Install RabbitMQ:
dnf install -y rabbitmq-server
Start RabbitMQ service:
systemctl enable rabbitmq-server --now
Install MariaDB:
dnf install -y mariadb-server mariadb-devel
systemctl enable mariadb --now
Configure MariaDB for LiCO:
mysql
create database lico character set utf8 collate utf8_bin;
create user '<USERNAME>'@'%' identified by '<PASSWORD>';
grant ALL on lico.* to '<USERNAME>'@'%';
exit
Configure the MariaDB limits:
sed -i "/\[mysqld\]/a\max-connections=1024" /etc/my.cnf.d/mariadb-server.cnf
mkdir /usr/lib/systemd/system/mariadb.service.d
cat << eof > /usr/lib/systemd/system/mariadb.service.d/limits.conf
[Service]
LimitNOFILE=10000
eof
systemctl daemon-reload
systemctl restart mariadb
dnf install -y influxdb
systemctl enable influxdb --now
influx
create database lico
use lico
create user <INFLUX_USERNAME> with password '<INFLUX_PASSWORD>' with all privileges
exit
sed -i '/# auth-enabled = false/a\ auth-enabled = true' /etc/influxdb/config.toml
systemctl restart influxdb
Install and configure OpenLDAP
Download the openldap-server package from Rocky official website:
https://download.rockylinux.org/pub/rocky/8/PowerTools/x86_64/os/Packages/o/openldap-servers-2.4.46-18.el8.x86_64.rpm
Upload the package to the LiCO management node
Install openldap-server:
dnf install -y openldap-servers-2.4.46-18.el8.x86_64.rpm
dnf install -y slapd-ssl-config
Modify the configuration file:
sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /usr/share/openldap-servers/lico.ldif
sed -i "/dc:/s/hpc/${lico_ldap_domain_component}/" /usr/share/openldap-servers/lico.ldif
sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /etc/openldap/slapd.conf
slapadd -v -l /usr/share/openldap-servers/lico.ldif -f /etc/openldap/slapd.conf -b \
${lico_ldap_domain_name}
Obtain the OpenLDAP key:
slappasswd
Edit /etc/openldap/slapd.conf to set the root password to the key that was obtained.
rootpw <ENCRYPT_LDAP_PASSWORD>
Change the owner of the configuration file:
chown -R ldap:ldap /var/lib/ldap
chown ldap:ldap /etc/openldap/slapd.conf
Start the OpenLDAP service:
systemctl enable slapd --now
Verify that the service has been started:
systemctl status slapd
The libuser module is a recommended toolkit for OpenLDAP. The installation of this module is optional.
Install libuser:
dnf install -y libuser python3-libuser
Download libuser.conf from https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/ to /etc on the management node, and modify this file referring to installation guide.
echo "TLS_REQCERT never" >> /etc/openldap/ldap.conf
Install nss-pam-ldapd:
dnf install -y nss-pam-ldapd
Download nslcd.conf from: https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/
Upload the file to /etc. Use installation guide to modify the configuration.
Modify file permissions
chmod 600 /etc/nslcd.conf
Start the nslcd service:
systemctl enable nslcd --now
Create the path for the configuration file:
mkdir -p /usr/share/authselect/vendor/nslcd
Download configuration files from: https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/authselect/authselect.tar.gz
Upload the configuration files to /root
Extract the archive:
tar -xzvf /root/authselect.tar.gz -C /usr/share/authselect/vendor/nslcd/
Enable the configuration:
authselect select nslcd with-mkhomedir --force
Install Lico module
dnf install -y python3-cffi
dnf install -y lico-core lico-file-manager lico-confluent-proxy \
lico-vnc-proxy lico-icinga-mond lico-async-task lico-service-tool
Configure shared directory for LiCO
mkdir -p /opt/lico/pub
touch /opt/lico/pub/DO_NOT_DELETE
echo "The file is required by lico monitor." >> /opt/lico/pub/DO_NOT_DELETE
echo "/opt/lico/pub *(ro,sync,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
Install portal
dnf install -y lico-workspace-skeleton lico-portal
Install AI component
dnf install -y lico-ai-scripts
(Optional) Provide e-mail, SMS, and WeChat services:
dnf install -y lico-mail-agent
dnf install -y lico-sms-agent
dnf install -y lico-wechat-agent
(Optional) Install Icinga2 monitoring components
dnf install -y lico-icinga-plugin-slurm
mkdir -p /etc/icinga2/zones.d/global-templates
echo -e "object CheckCommand \"lico_monitor\" {\n command = [ \"/opt/lico/pub/monitor/lico_icinga_plugin/\
lico-icinga-plugin\" ]\n}" > /etc/icinga2/zones.d/global-templates/commands.conf
echo -e "object CheckCommand \"lico_job_monitor\" {\n command = [ \"/opt/lico/pub/monitor/lico_icinga_plugin/\
lico-job-icinga-plugin\" ]\n}" >> /etc/icinga2/zones.d/global-templates/commands.conf
echo -e "object CheckCommand \"lico_check_procs\" {\n command =[ \"/opt/lico/pub/monitor/lico_icinga_plugin/\
lico-process-icinga-plugin\" ]\n}" >>/etc/icinga2/zones.d/global-templates/commands.conf
echo -e "object CheckCommand \"lico_vnc_monitor\" {\n command =[ \"/opt/lico/pub/monitor/lico_icinga_plugin/\
lico-vnc-icinga-plugin\" ]\n}" >>/etc/icinga2/zones.d/global-templates/commands.conf
mkdir -p /etc/icinga2/zones.d/master
echo -e "object Host \"${sms_name}\" {\n check_command = \"hostalive\"\n \
address = \"${sms_ip}\"\n vars.agent_endpoint = name\n}\n" >> \
/etc/icinga2/zones.d/master/hosts.conf
for ((i=0;i<$num_computes;i++));do
echo -e "object Endpoint \"${c_name[${i}]}\" {\n host = \"${c_name[${i}]}\"\n \
port = \"${icinga_api_port}\"\n log_duration = 0\n}\nobject \
Zone \"${c_name[${i}]}\" {\n endpoints = [ \"${c_name[${i}]}\" ]\n \
parent = \"master\"\n}\n" >> /etc/icinga2/zones.d/master/agent.conf
echo -e "object Host \"${c_name[${i}]}\" {\n check_command = \"hostalive\"\n \
address = \"${c_ip[${i}]}\"\n vars.agent_endpoint = name\n}\n" >> \
/etc/icinga2/zones.d/master/hosts.conf
done
echo -e "apply Service \"lico\" {\n check_command = \"lico_monitor\"\n \
max_check_attempts = 5\n check_interval = 1m\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" > \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-procs-service\" {\n check_command = \"lico_\
check_procs\"\n enable_active_checks = false\n assign where \
host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-job-service\" {\n check_command = \"lico_job_monitor\"\n \
max_check_attempts = 5\n check_interval = 1m\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-vnc-service\" {\n check_command = \"lico_vnc_monitor\"\n \
max_check_attempts = 5\n check_interval = 15s\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
chown -R icinga:icinga /etc/icinga2/zones.d/master
systemctl restart icinga2
modprobe ipmi_devintf
systemctl enable icinga2
Restart services:
systemctl restart confluent
Note The username and password of icinga2 can be viewed and changed at /etc/icinga2/conf.d/api-users.conf
cd /etc/lico
\cp gres.csv.example gres.csv
\cp nodes.csv.example nodes.csv
vim nodes.csv
lico-password-tool
mkdir -p /tmp/scratchdir/var/lib/lico/tool
cp /var/lib/lico/tool/.db /tmp/scratchdir/var/lib/lico/tool/
cd lico.ini.d/
sed -i s/false/true/ user.ini
lico init
sed -i s/80/8080/g /etc/nginx/nginx.conf
sed -i s/443/444/ /etc/nginx/conf.d/https.conf
luseradd hpcadmin -P Passw0rd@123
lico import_user -u hpcadmin -r admin
lico-service-tool start
lico-service-tool enable
Repos
share_installer_dir="/install/installer"
mkdir -p $share_installer_dir
echo "/install/installer *(rw,async,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
cp /etc/hosts $share_installer_dir
cp /etc/security/limits.conf $share_installer_dir
cp /etc/yum.repos.d/EL8-OS.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/EL8-OS.repo
sed -i "/name=appstream/a\baseurl=http://${sms_name}${os_repo_dir}/AppStream/" \
$share_installer_dir/EL8-OS.repo
sed -i "/name=baseos/a\baseurl=http://${sms_name}${os_repo_dir}/BaseOS/" \
$share_installer_dir/EL8-OS.repo
cp /etc/yum.repos.d/lenovo-hpc.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/lenovo-hpc.repo
sed -i '/^gpgkey=/d' $share_installer_dir/lenovo-hpc.repo
echo "baseurl=http://${sms_name}${confluent_repo_dir}/lenovo-hpc-el8" \
>> $share_installer_dir/lenovo-hpc.repo
echo "gpgkey=http://${sms_name}${confluent_repo_dir}/lenovo-hpc-el8\
/lenovohpckey.pub" >> $share_installer_dir/lenovo-hpc.repo
cp /etc/yum.repos.d/Lenovo.OpenHPC.local.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/Lenovo.OpenHPC.local.repo
sed -i '/^gpgkey=/d' $share_installer_dir/Lenovo.OpenHPC.local.repo
echo "baseurl=http://${sms_name}${link_ohpc_repo_dir}/EL_8" \
>> $share_installer_dir/Lenovo.OpenHPC.local.repo
echo "gpgkey=http://${sms_name}${link_ohpc_repo_dir}/EL_8\
/repodata/repomd.xml.key" >> $share_installer_dir/Lenovo.OpenHPC.local.repo
cp /etc/yum.repos.d/lico-dep.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/lico-dep.repo
sed -i '/^gpgkey=/d' $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-library/a\baseurl=http://${sms_name}\
${link_lico_dep_repo_dir}/library/" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-library/a\gpgkey=http://${sms_name}\
${link_lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-EL8" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-standalone/a\baseurl=http://${sms_name}\
${link_lico_dep_repo_dir}/standalone/" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-standalone/a\gpgkey=http://${sms_name}\
${link_lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-EL8" $share_installer_dir/lico-dep.repo
cp /etc/yum.repos.d/lico-release.repo $share_installer_dir
sed -i '/baseurl=/d' $share_installer_dir/lico-release.repo
sed -i "/name=lico-release-host/a\baseurl=http://${sms_name}\
${link_lico_repo_dir}/host/" $share_installer_dir/lico-release.repo
sed -i "/name=lico-release-public/a\baseurl=http://${sms_name}\
${link_lico_repo_dir}/public/" $share_installer_dir/lico-release.repo
Configure automatic start for the GPU driver (for GPU image)
Dowload NVIDIA-Linux-x86_64-520.61.07.run from https://us.download.nvidia.com/tesla/520.61.07/NVIDIA-Linux-x86_64-520.61.07.run and copy it to the shared directory $share_installer_dir
cat << eof > $share_installer_dir/nvidia-persistenced.service
[Unit]
Description=NVIDIA Persistence Daemon
After=syslog.target
[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced/*
TimeoutSec=300
[Install]
WantedBy=multi-user.target
eof
cat << eof > $share_installer_dir/nvidia-modprobe-loader.service
[Unit]
Description=NVIDIA ModProbe Service
After=syslog.target
Before=slurmd.service
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-modprobe -u -c=0
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
eof
cat << eof > $share_installer_dir/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
eof
Slurm config file
cp /etc/slurm/slurm.conf $share_installer_dir/slurm.conf
cp /etc/slurm/cgroup.conf $share_installer_dir/cgroup.conf
cp /etc/slurm/gres.conf $share_installer_dir/gres.conf
cp /etc/munge/munge.key $share_installer_dir
LDAP config file
cp /etc/openldap/ldap.conf $share_installer_dir
cp /etc/nslcd.conf $share_installer_dir/nslcd.conf
Authselect
cp /root/authselect.tar.gz $share_installer_dir
Synchronize the files to the image and clean up the original file
\cp ~/lico_env.local /tmp/scratchdir/root/
\cp $share_installer_dir/hosts /tmp/scratchdir/etc/hosts
\cp $share_installer_dir/limits.conf /tmp/scratchdir/etc/security/limits.conf
\cp $share_installer_dir/EL8-OS.repo /tmp/scratchdir/etc/yum.repos.d/
\cp $share_installer_dir/Lenovo.OpenHPC.local.repo /tmp/scratchdir/etc/yum.repos.d/
echo -e %_excludedocs 1 >> /tmp/scratchdir/root/.rpmmacros
\cp $share_installer_dir/lico-dep.repo /tmp/scratchdir/etc/yum.repos.d/
\cp $share_installer_dir/lico-release.repo /tmp/scratchdir/etc/yum.repos.d/
cd /tmp/scratchdir/etc/yum.repos.d
mkdir rocky
mv Rocky* rocky/
For the GPU image do the following:
\cp ~/lico_env.local /tmp/scratchdir-gpu/root/
\cp $share_installer_dir/hosts /tmp/scratchdir-gpu/etc/hosts
\cp $share_installer_dir/limits.conf /tmp/scratchdir-gpu/etc/security/limits.conf
\cp $share_installer_dir/EL8-OS.repo /tmp/scratchdir-gpu/etc/yum.repos.d/
\cp $share_installer_dir/Lenovo.OpenHPC.local.repo /tmp/scratchdir-gpu/etc/yum.repos.d/
echo -e %_excludedocs 1 >> /tmp/scratchdir-gpu/root/.rpmmacros
\cp $share_installer_dir/lico-dep.repo /tmp/scratchdir-gpu/etc/yum.repos.d/
\cp $share_installer_dir/lico-release.repo /tmp/scratchdir-gpu/etc/yum.repos.d/
\cp $share_installer_dir/nvidia-* /tmp/scratchdir-gpu/usr/lib/systemd/system/
\cp $share_installer_dir/blacklist-nouveau.conf /tmp/scratchdir-gpu/usr/lib/modprobe.d/blacklist-nouveau.conf
cd /tmp/scratchdir-gpu/etc/yum.repos.d
mkdir rocky
mv Rocky* rocky/
dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils \
elfutils-libelf-devel libglvnd-devel
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
chmod +x $share_installer_dir/NVIDIA-Linux-x86_64-520.61.07.run
cd $share_installer_dir
$share_installer_dir/NVIDIA-Linux-x86_64-520.61.07.run --add-this-kernel -s
For Non-GPU image:
imgutil exec -v /install/installer:- /tmp/scratchdir
If you are building the GPU image:
imgutil exec -v /install/installer:- /tmp/scratchdir-gpu
source /root/lico_env.local
share_installer_dir="/install/installer"
dnf module reset nginx
dnf module enable -y nginx:1.20
dnf install -y chrony
Edit /etc/chrony.conf to configure chrony
Set the system to automatically start after startup
systemctl enable chronyd
echo "${sms_ip}:/home /home nfs nfsvers=4.0,nodev,nosuid,noatime 0 0" >> /etc/fstab
mkdir -p /home
mkdir -p $share_installer_dir
echo "${sms_ip}:/install/installer /install/installer nfs nfsvers=4.0,nodev,nosuid,noatime 0 0" >> /etc/fstab
mount -a
cp $share_installer_dir/ldap.conf /etc/openldap/ldap.conf
dnf install -y nss-pam-ldapd
cp $share_installer_dir/nslcd.conf /etc/nslcd.conf
chmod 600 /etc/nslcd.conf
systemctl enable nslcd
mkdir -p /usr/share/authselect/vendor/nslcd
tar -xzvf $share_installer_dir/authselect.tar.gz -C /usr/share/authselect/vendor/nslcd/
dnf install -y authselect
authselect select nslcd with-mkhomedir --force
dnf install -y icinga2
icinga2 node setup --master --disable-confd
echo -e "LANG=en_US.UTF-8" >> /etc/sysconfig/icinga2
dnf install -y ohpc-base-compute ohpc-slurm-client lmod-ohpc
echo 'account required pam_slurm.so' >> /etc/pam.d/sshd (option)
cp $share_installer_dir/munge.key /etc/munge/munge.key
cp $share_installer_dir/cgroup.conf /etc/slurm/cgroup.conf
cp $share_installer_dir/slurm.conf /etc/slurm/slurm.conf
cp $share_installer_dir/gres.conf /etc/slurm/gres.conf
systemctl enable munge
systemctl enable slurmd
dnf install -y tar bzip2 make automake gcc gcc-c++ pciutils \
elfutils-libelf-devel libglvnd-devel
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
echo "${sms_ip}:/opt/lico/pub /opt/lico/pub nfs nfsvers=4.0,nodev,noatime 0 0" >> /etc/fstab
mkdir -p /opt/lico/pub
mount -a
echo "${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=4.0,nodev,noatime 0 0" >> /etc/fstab
mkdir -p /opt/ohpc/pub
mount -a
exit
imgutil pack /tmp/scratchdir/ rocky-8.6-diskless-slurm
or
exit
imgutil pack /tmp/scratchdir/ rocky-8.6-diskless-slurm-gpu
Create and edit: /var/lib/confluent/public/os/rocky-8.6-diskless-slurm/scripts/onboot.d/icinga.sh
or
/var/lib/confluent/public/os/rocky-8.6-diskless-slurm-gpu/scripts/onboot.d/icinga.sh
sms_name=head
icinga_api_port=5665
icinga2 pki save-cert --trustedcert /var/lib/icinga2/certs/trusted-parent.crt --host ${sms_name}
nodename=`uname -a |awk '{print $2}'`
ticket=`ssh $sms_name icinga2 pki ticket --cn $nodename`
icinga2 node setup --ticket ${ticket} --cn $nodename --endpoint ${sms_name} --zone $nodename --parent_zone master --parent_host ${sms_name} --trustedcert /var/lib/icinga2/certs/trusted-parent.crt --accept-commands --accept-config --disable-confd
modprobe ipmi_devintf
systemctl start icinga2
systemctl enable icinga2
Note: Hostname in this script and hostname defined in lico_env.local should be consistent
For GPU image also add the following script: /var/lib/confluent/public/os/rocky-8.6-diskless-slurm-gpu/scripts/onboot.d/gpu-drivers.sh
systemctl stop slurmd
share_installer_dir="/install/installer"
$share_installer_dir/NVIDIA-Linux-x86_64-520.61.07-custom.run -s
mkdir -p /var/run/nvidia-persistenced
systemctl daemon-reload
systemctl enable nvidia-persistenced --now
systemctl enable nvidia-modprobe-loader.service --now
systemctl restart slurmd
nodedeploy compute -n rocky-8.6-diskless-slurm
For GPU nodes:
nodedeploy gpu -n rocky-8.6-diskless-slurm-gpu
Monitor on XCC untill the deployment is finished.
imgutil unpack rocky-8.6-diskless-slurm /tmp/scratchdir-v2/
Enter the image to make modifications
imgutil exec /tmp/scratchdir-v2/
Pack the image
Note: the new image cannot have the same name as existing images
imgutil pack /tmp/scratchdir-v2/ rocky-8.6-diskless-slurm-v2
Copy the profile.yaml and onboot scripts of the previous image as required
cd /var/lib/confluent/public/os
cp rocky-8.6-diskless-slurm/profile.yaml rocky-8.6-diskless-slurm-v2/profile.yaml
cp rocky-8.6-diskless-slurm/scripts/onboot.d/* rocky-8.6-diskless-slurm-v2/scripts/onboot.d/