Prerequisites
Before your installation, please reference LiCO best recipe to make sure the cluster hardware use the correct drivers and settings. You can get the best recipe document from the below link:
Before your installation, please reference the Oses part of LeSI 18A_SI best recipe to install the OS security patch. You can get the best recipe document from the below link:
You can setup CentOS/RedHat or SLES-12-SP3-Server/SLES-12-SP3-SDK base repository (online or local) on management node
Unless stated in this Guide, all commands are run on the management node
If you must open firewall. Please refer Cluster Service Summary to modify the firewall rules
The user is responsible for regularly updating the components and OS. It is important to regularly patch and update components and OS to prevent security vulnerabilities. For how to update OS packages, please refer to How to update OS packages
This document is for the typical cluster which contains management, login and compute nodes, as shown in the diagram below. But LiCO also support the cluster only contains management and compute nodes. For this kind of cluster, all the LiCO modules installed on login node need to be installed on management node.
- Management node
It is the core of the HPC/AI cluster, undertaking primary functions such as cluster management, monitoring, scheduling, strategy management, and user and account management.
- Compute node
As the name implies, the compute node completes computing tasks.
- Login node
The login node connects the cluster to the external network or cluster. Users must use the login node to login to upload application data, to develop compilers and submit scheduled tasks.
- Parallel File System
Provides a shared storage function. The high-speed network is always used to connect the cluster nodes and parallel file system. This Guide only show an example of installing NFS shared file system, how to setup the parallel file system is out of range.
- Nodes BMC interface
BMC interface is used to access node’s BMC system
- Nodes eth interface
Ethernet interface is used to manage the nodes in cluster, it also can be used to transfer computing data.
- High speed network interface
The high speed network is optional. It is always used to support parallel file system, also can be used to transfer computing data.
Instructions
Please replace the <*_USERNAME> and <*_PASSWORD> part to your actual username and password in this document.
Deploying Cluster Environment
If the cluster environment already exists, then you may skip this chapter. (check List of Components to see that software is already installed and can pass the Checkpoint A , Checkpoint B).
- Installing Operating System
- Installing Infrastructure Software
- List of Infrastructure Software
- Set the Local Repository for Management Node
- Configuring the Local Repository for Compute and Login Nodes
- Configuring LiCO Dependencies Repository
- Installing slurm
- Configuring nfs
- Configuring ntp
- Installing cuda and cudnn
- Installing slurm
- Installing ganglia
- Installing mpi
- Installing singularity
- Checkpoint B
- Installing Other Components
Installing LiCO
This chapter mainly introduces the distribution and installation of LiCO services in the cluster.
Note
If you want to install LiCO quickly. Please refer: How To Install LiCO Quickly