Installing Operating System

Installing an OS on the Management Node

Note

If all nodes in the cluster have completed the operating system installation. You can skip this step.

Install an official version of el7 or sle12 on the management node and you can select the smallest installation

Deploying the OS on Other Nodes in the Cluster

Configuring Environmental Variables

  • After logging into the management node , run the commands below to configure environmental variables for the entire installation process

    $ su root
    $ cd ~
    $ vi lico_env.local
    
  • Edit file lico_env.local

    Note

    Based on the following prompts, edit lico_env.local and save. (In the final file, ignore all annotations starting with #),This article assumes that the node’s BMC user name and password are the same, if inconsistent, need to be modified when installing to Set xcat Node Information

    # Management node hostname
    sms_name="head"
    # Set the domain name
    domain_name="hpc.com"
    # Set the OpenLDAP domain name
    lico_ldap_domain_name="dc=hpc,dc=com"
    # The IP address of the management node in the cluster intranet
    sms_ip="192.168.0.1"
    # The network card name of the management node's IP address
    sms_eth_internal="eth0"
    # Subnet mask of the intranet in the cluster.
    # If the OS has been installed on all the nodes in the cluster,
    # the default configuration is maintained.
    internal_netmask="255.255.0.0"
    
    # The user name and password of BMC
    bmc_username="<BMC_USERNAME>"
    bmc_password="<BMC_PASSWORD>"
    
    # xCAT OS Image Path
    iso_path="/isos"
    # Lenovo OpenHPC's local source directory
    ohpc_repo_dir="/install/custom/ohpc"
    
    # ios's local source directory
    os_repo_dir="/install/custom/server"
    sdk_repo_dir="/install/custom/sdk"
    
    # xCAT's local source directory
    xcat_repo_dir="/install/custom/xcat"
    
    # Local Yum repository directory for Lenovo OpenHPC
    ohpc_repo_dir="/install/custom/ohpc"
    
    # LiCO-DEP's local source directory
    lico_dep_repo_dir="/install/custom/lico-dep"
    
    # LiCO's local source directory
    lico_repo_dir="/install/custom/lico"
    
    # Total number of computing nodes
    num_computes="2"
    # Calculate the prefix of node hostname.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    compute_prefix="c"
    # Computes a list of node hostnames.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    c_name[0]=c1
    c_name[1]=c2
    # Compute node IP list.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    c_ip[0]=192.168.0.6
    c_ip[1]=192.168.0.16
    # The compute node IP corresponds to the network adapter MAC address.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    c_mac[0]=fa:16:3e:73:ec:50
    c_mac[1]=fa:16:3e:27:32:c6
    # Compute node BMC address list.
    c_bmc[0]=192.168.1.6
    c_bmc[1]=192.168.1.16
    
    # Total number of login nodes
    num_logins="1"
    # Login nodes hostname list.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    l_name[0]=l1
    # Login node IP list. If all nodes in the cluster have completed the OS installation, fill in the actual
    l_ip[0]=192.168.0.15
    # Login nodes IP corresponds to the MAC address of the network card.
    # If all nodes in the cluster have completed the OS installation, fill in the actual
    l_mac[0]=fa:16:3e:2c:7a:47
    # Login Node BMC Address List
    l_bmc[0]=192.168.1.15
    
  • Make the configuration file take effect

    $ sudo chmod 600 lico_env.local
    $ sudo source lico_env.local
    

    Note

    After the cluster environment is set up, you need to configure the public network IP on the login node to log in LiCO web from the external network.

Get the Local Repository

  • Download the official image

  • el7

    Download CentOS-7-x86_64-Everything-1708.iso from the official website, copy it to the pathway ${iso_path} and run the commands below:
    $ sudo mkdir -p ${iso_path}
    # run the command below to verify the iso file, and you can get the verification
    # code from http://centos.unixheads.org/7/isos/x86_64/sha256sum.txt
    # then make sure the one is the same as another.
    $ sudo sha256sum ${iso_path}/CentOS-7-x86_64-Everything-1708.iso
    
    # mount image
    $ sudo mkdir -p ${os_repo_dir}
    $ sudo mount -o loop ${iso_path}/CentOS-7-x86_64-Everything-1708.iso  ${os_repo_dir}
    
    # configuration local repository
    $ sudo cat << eof > ${iso_path}/EL7-OS.repo
      [EL7-OS]
      name=el7-centos
      enabled=1
      gpgcheck=0
      type=rpm-md
      baseurl=file:///${os_repo_dir}
      eof
    
    $ sudo cp -a ${iso_path}/EL7-OS.repo /etc/yum.repos.d/
    
  • sle12

    Download SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso and SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso from the official website, copy it to the pathway ${iso_path} and run the commands below:
    $ sudo mkdir -p ${iso_path}
    # run the command below to verify the iso file, and check it’s the same as the one
    # you get from the above link
    $ sudo md5sum ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso
    $ sudo md5sum ${iso_path}/SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso
    
    # mount image
    $ sudo mkdir -p ${os_repo_dir}
    $ sudo mkdir -p ${sdk_repo_dir}
    $ sudo mount -o loop ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso  ${os_repo_dir}
    $ sudo mount -o loop ${iso_path}/SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso  ${sdk_repo_dir}
    
    # configuration local repository
    $ sudo cat << eof > ${iso_path}/SLES12-SP3-12.3.repo
      [SLES12-SP3-12.3-SERVER]
      name=sle12-server
      enabled=1
      autorefresh=0
      gpgcheck=0
      baseurl=file:///${os_repo_dir}
    
      [SLES12-SP3-12.3-SDK]
      name=sle12-sdk
      enabled=1
      autorefresh=0
      gpgcheck=0
      baseurl=file:///${os_repo_dir}
      eof
    
    $ sudo zypper ar ${iso_path}/SLES12-SP3-12.3.repo
    

Installing Lenovo xcat

Prepare OS Mirrors for Other Nodes

Note

If all nodes in the cluster have completed the operating system installation. You can skip this step.

  • Import image

    $ sudo copycds ${iso_path}/CentOS-7-x86_64-Everything-1708.iso
    
    $ sudo copycds ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso
    
  • Verify image is available

    $ sudo lsdef -t osimage
    ...
    centos7.4-x86_64-install-compute (osimage)
    centos7.4-x86_64-netboot-compute (osimage)
    centos7.4-x86_64-statelite-compute (osimage)
    ...
    
    $ sudo lsdef -t osimage
    ...
    sles12.3-x86_64-install-compute  (osimage)
    sles12.3-x86_64-install-service  (osimage)
    sles12.3-x86_64-netboot-compute  (osimage)
    sles12.3-x86_64-statelite-compute  (osimage)
    ...
    
  • Set image parameter

    Note

    Nouveau module is an accelerated open source driver for NVIDIA cards. Following NVIDIA official installation guide, this module should disabled before installing CUDA driver

    $ chdef -t osimage centos7.4-x86_64-install-compute addkcmdline="rdblacklist=nouveau nouveau.modeset=0 R::modprobe.blacklist=nouveau"
    
    $ chdef -t osimage sles12.3-x86_64-install-compute addkcmdline="rdblacklist=nouveau nouveau.modeset=0 R::modprobe.blacklist=nouveau"
    

Set xcat Node Information

Import compute nodes information
  for ((i=0; i<$num_computes; i++)); do
  sudo mkdef -t node ${c_name[$i]} groups=compute,all arch=x86_64 netboot=xnba mgt=ipmi bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${c_ip[$i]} mac=${c_mac[$i]} bmc=${c_bmc[$i]} serialport=0 serialspeed=115200;
  done
Import login nodes information
  for ((i=0; i<$num_logins; i++)); do
  sudo mkdef -t node ${l_name[$i]} groups=login,all  arch=x86_64 netboot=xnba mgt=ipmi bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${l_ip[$i]} mac=${l_mac[$i]} bmc=${l_bmc[$i]} serialport=0 serialspeed=115200;
  done

Note

If the BMC username and password of the node are inconsistent, run the following command to modify

$ sudo tabedit ipmi
Run the commands below to configure the root account password for the node (You need to replace <ROOT_PASSWORD> with the password you want to set)
  $ sudo chtab key=system passwd.username=root passwd.password=<ROOT_PASSWORD>

Add hosts Resolution

Note

If the cluster has already installed the operating system and can resolve the IP address through the hostname, skip this step.

$ sudo chdef -t site domain=${domain_name}
$ sudo chdef -t site master=${sms_ip}
$ sudo chdef -t site nameservers=${sms_ip}
$ sudo sed -i "/^\s*${sms_ip}\s*.*$/d" /etc/hosts
$ sudo sed -i "/\s*${sms_name}\s*/d" /etc/hosts
$ sudo echo "${sms_ip}    ${sms_name}   ${sms_name}.${domain_name} " >> /etc/hosts
$ sudo makehosts

配置DHCP及DNS服务

Run the following command
$ sudo makenetworks
$ sudo makedhcp -n
$ sudo makedns -n

Installing Operating System Through the Network

Note

If all nodes in the cluster have completed the operating system installation. You can skip this step.

  $ sudo nodeset all osimage=centos7.4-x86_64-install-compute
  $ sudo rsetboot all net -u
  $ sudo rpower all reset
  $ sudo nodeset all osimage=sles12.3-x86_64-install-compute
  $ sudo rsetboot all net -u
  $ sudo rpower all reset

Note

It takes several minutes to finish the OS installation, you can use the below command to check the progress

$ sudo nodestat all

Checkpoint A

$ sudo psh all uptime
...
c1: 05:03am up 0:02, 0 users, load average: 0.20, 0.13, 0.05
c2: 05:03am up 0:02, 0 users, load average: 0.20, 0.14, 0.06
l1: 05:03am up 0:02, 0 users, load average: 0.17, 0.13, 0.05