Installing Operating System
Installing an OS on the Management Node
Note
If all nodes in the cluster have completed the operating system installation. You can skip this step.
Install an official version of or on the management node and you can select the smallest installation
Deploying the OS on Other Nodes in the Cluster
Configuring Environmental Variables
After logging into the management node , run the commands below to configure environmental variables for the entire installation process
$ su root $ cd ~ $ vi lico_env.local
Edit file
lico_env.local
Note
Based on the following prompts, edit
lico_env.local
and save. (In the final file, ignore all annotations starting with#
),This article assumes that the node’s BMC user name and password are the same, if inconsistent, need to be modified when installing to Set xcat Node Information# Management node hostname sms_name="head" # Set the domain name domain_name="hpc.com" # Set the OpenLDAP domain name lico_ldap_domain_name="dc=hpc,dc=com" # The IP address of the management node in the cluster intranet sms_ip="192.168.0.1" # The network card name of the management node's IP address sms_eth_internal="eth0" # Subnet mask of the intranet in the cluster. # If the OS has been installed on all the nodes in the cluster, # the default configuration is maintained. internal_netmask="255.255.0.0" # The user name and password of BMC bmc_username="<BMC_USERNAME>" bmc_password="<BMC_PASSWORD>" # xCAT OS Image Path iso_path="/isos" # Lenovo OpenHPC's local source directory ohpc_repo_dir="/install/custom/ohpc" # ios's local source directory os_repo_dir="/install/custom/server" sdk_repo_dir="/install/custom/sdk" # xCAT's local source directory xcat_repo_dir="/install/custom/xcat" # Local Yum repository directory for Lenovo OpenHPC ohpc_repo_dir="/install/custom/ohpc" # LiCO-DEP's local source directory lico_dep_repo_dir="/install/custom/lico-dep" # LiCO's local source directory lico_repo_dir="/install/custom/lico" # Total number of computing nodes num_computes="2" # Calculate the prefix of node hostname. # If all nodes in the cluster have completed the OS installation, fill in the actual compute_prefix="c" # Computes a list of node hostnames. # If all nodes in the cluster have completed the OS installation, fill in the actual c_name[0]=c1 c_name[1]=c2 # Compute node IP list. # If all nodes in the cluster have completed the OS installation, fill in the actual c_ip[0]=192.168.0.6 c_ip[1]=192.168.0.16 # The compute node IP corresponds to the network adapter MAC address. # If all nodes in the cluster have completed the OS installation, fill in the actual c_mac[0]=fa:16:3e:73:ec:50 c_mac[1]=fa:16:3e:27:32:c6 # Compute node BMC address list. c_bmc[0]=192.168.1.6 c_bmc[1]=192.168.1.16 # Total number of login nodes num_logins="1" # Login nodes hostname list. # If all nodes in the cluster have completed the OS installation, fill in the actual l_name[0]=l1 # Login node IP list. If all nodes in the cluster have completed the OS installation, fill in the actual l_ip[0]=192.168.0.15 # Login nodes IP corresponds to the MAC address of the network card. # If all nodes in the cluster have completed the OS installation, fill in the actual l_mac[0]=fa:16:3e:2c:7a:47 # Login Node BMC Address List l_bmc[0]=192.168.1.15
Make the configuration file take effect
$ sudo chmod 600 lico_env.local $ sudo source lico_env.local
Note
After the cluster environment is set up, you need to configure the public network IP on the login node to log in LiCO web from the external network.
Get the Local Repository
Download the official image
-
$ sudo mkdir -p ${iso_path} # run the command below to verify the iso file, and you can get the verification # code from http://centos.unixheads.org/7/isos/x86_64/sha256sum.txt # then make sure the one is the same as another. $ sudo sha256sum ${iso_path}/CentOS-7-x86_64-Everything-1708.iso # mount image $ sudo mkdir -p ${os_repo_dir} $ sudo mount -o loop ${iso_path}/CentOS-7-x86_64-Everything-1708.iso ${os_repo_dir} # configuration local repository $ sudo cat << eof > ${iso_path}/EL7-OS.repo [EL7-OS] name=el7-centos enabled=1 gpgcheck=0 type=rpm-md baseurl=file:///${os_repo_dir} eof $ sudo cp -a ${iso_path}/EL7-OS.repo /etc/yum.repos.d/
-
$ sudo mkdir -p ${iso_path} # run the command below to verify the iso file, and check it’s the same as the one # you get from the above link $ sudo md5sum ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso $ sudo md5sum ${iso_path}/SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso # mount image $ sudo mkdir -p ${os_repo_dir} $ sudo mkdir -p ${sdk_repo_dir} $ sudo mount -o loop ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso ${os_repo_dir} $ sudo mount -o loop ${iso_path}/SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso ${sdk_repo_dir} # configuration local repository $ sudo cat << eof > ${iso_path}/SLES12-SP3-12.3.repo [SLES12-SP3-12.3-SERVER] name=sle12-server enabled=1 autorefresh=0 gpgcheck=0 baseurl=file:///${os_repo_dir} [SLES12-SP3-12.3-SDK] name=sle12-sdk enabled=1 autorefresh=0 gpgcheck=0 baseurl=file:///${os_repo_dir} eof $ sudo zypper ar ${iso_path}/SLES12-SP3-12.3.repo
Installing Lenovo xcat
Download xcat
Configuration and Installation
Upload the package to management node, and then run the commands below to install xcat
# Creat xCAT's local source $ sudo yum -y install bzip2 $ sudo mkdir -p $xcat_repo_dir $ sudo tar -xvf xcat-2.13.8.lenovo3_confluent-1.8.2_lenovo_confluent-0.8.1-el7.tar.bz2 -C $xcat_repo_dir $ sudo cd $xcat_repo_dir/lenovo-hpc-el7 $ sudo ./mklocalrepo.sh $ sudo cd ~ # Installing xCAT $ sudo yum -y install xCAT $ sudo systemctl start xcatd $ sudo source /etc/profile.d/xcat.sh
# Creat xCAT's local source $ sudo mkdir -p $xcat_repo_dir $ sudo tar -xvf xcat-2.13.8.lenovo3_confluent-1.8.2_lenovo_confluent-0.8.1-sles12.tar.bz2 -C $xcat_repo_dir $ sudo cd $xcat_repo_dir/lenovo-hpc-sles12 $ sudo ./mklocalrepo.sh $ sudo cd ~ # Installing xCAT $ sudo zypper install xCAT $ sudo systemctl start xcatd $ sudo source /etc/profile.d/xcat.sh
Prepare OS Mirrors for Other Nodes
Note
If all nodes in the cluster have completed the operating system installation. You can skip this step.
Import image
$ sudo copycds ${iso_path}/CentOS-7-x86_64-Everything-1708.iso
$ sudo copycds ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso
Verify image is available
$ sudo lsdef -t osimage ... centos7.4-x86_64-install-compute (osimage) centos7.4-x86_64-netboot-compute (osimage) centos7.4-x86_64-statelite-compute (osimage) ...
$ sudo lsdef -t osimage ... sles12.3-x86_64-install-compute (osimage) sles12.3-x86_64-install-service (osimage) sles12.3-x86_64-netboot-compute (osimage) sles12.3-x86_64-statelite-compute (osimage) ...
Set image parameter
Note
Nouveau module is an accelerated open source driver for NVIDIA cards. Following NVIDIA official installation guide, this module should disabled before installing CUDA driver
$ chdef -t osimage centos7.4-x86_64-install-compute addkcmdline="rdblacklist=nouveau nouveau.modeset=0 R::modprobe.blacklist=nouveau"
$ chdef -t osimage sles12.3-x86_64-install-compute addkcmdline="rdblacklist=nouveau nouveau.modeset=0 R::modprobe.blacklist=nouveau"
Set xcat Node Information
for ((i=0; i<$num_computes; i++)); do sudo mkdef -t node ${c_name[$i]} groups=compute,all arch=x86_64 netboot=xnba mgt=ipmi bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${c_ip[$i]} mac=${c_mac[$i]} bmc=${c_bmc[$i]} serialport=0 serialspeed=115200; donefor ((i=0; i<$num_logins; i++)); do sudo mkdef -t node ${l_name[$i]} groups=login,all arch=x86_64 netboot=xnba mgt=ipmi bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${l_ip[$i]} mac=${l_mac[$i]} bmc=${l_bmc[$i]} serialport=0 serialspeed=115200; doneNote
If the BMC username and password of the node are inconsistent, run the following command to modify
$ sudo tabedit ipmi$ sudo chtab key=system passwd.username=root passwd.password=<ROOT_PASSWORD>
Add hosts Resolution
Note
If the cluster has already installed the operating system and can resolve the IP address through the hostname, skip this step.
$ sudo chdef -t site domain=${domain_name}
$ sudo chdef -t site master=${sms_ip}
$ sudo chdef -t site nameservers=${sms_ip}
$ sudo sed -i "/^\s*${sms_ip}\s*.*$/d" /etc/hosts
$ sudo sed -i "/\s*${sms_name}\s*/d" /etc/hosts
$ sudo echo "${sms_ip} ${sms_name} ${sms_name}.${domain_name} " >> /etc/hosts
$ sudo makehosts
配置DHCP及DNS服务
$ sudo makenetworks
$ sudo makedhcp -n
$ sudo makedns -n
Installing Operating System Through the Network
Note
If all nodes in the cluster have completed the operating system installation. You can skip this step.
$ sudo nodeset all osimage=centos7.4-x86_64-install-compute
$ sudo rsetboot all net -u
$ sudo rpower all reset
$ sudo nodeset all osimage=sles12.3-x86_64-install-compute
$ sudo rsetboot all net -u
$ sudo rpower all reset
Note
It takes several minutes to finish the OS installation, you can use the below command to check the progress
$ sudo nodestat all
Checkpoint A
$ sudo psh all uptime
...
c1: 05:03am up 0:02, 0 users, load average: 0.20, 0.13, 0.05
c2: 05:03am up 0:02, 0 users, load average: 0.20, 0.14, 0.06
l1: 05:03am up 0:02, 0 users, load average: 0.17, 0.13, 0.05