Note:
If using Redhat 8.6 or Rockey 8.6 Linux, user should update kernel version to equal or more than 4.18.0-372.26.1. Otherwise,some Intel oneAPI toolkit will fail to install in the server.
Step 1: Download install file.
wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18970/l_BaseKit_p_2022.3.1.17310_offline.shStep 2: Use $ sudo sh ./
xxxxxxxxxxsudo sh ./l_BaseKit_p_2022.3.1.17310_offline.shStep 3: Follow the instructions in the installer to finish the Intel oneAPI Base Toolkit installation.
note:
You can follow the instructions in https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit.html to install Intel oneAPI Base Toolkit on the management node according to your own needs.
Step 1: Download install file.
xxxxxxxxxxwget https://registrationcenter-download.intel.com/akdlm/irc_nas/18975/l_HPCKit_p_2022.3.1.16997_offline.shStep 2: Use $ sudo sh ./
xxxxxxxxxxsudo sh ./l_HPCKit_p_2022.3.1.16997_offline.shStep 3: Follow the instructions in the installer to finish the Intel oneAPI HPC Toolkit installation.
note:
You can follow the instructions in https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit.html to install Intel oneAPI HPC Toolkit on the management node according to your own needs.
Step 1: Download install file.
xxxxxxxxxxwget https://registrationcenter-download.intel.com/akdlm/irc_nas/18979/l_AIKit_p_2022.3.1.20890_offline.shStep 2: Use $ sudo sh ./
xxxxxxxxxxsudo sh ./l_AIKit_p_2022.3.1.20890_offline.shStep 3: Follow the instructions in the installer to finish the Intel oneAPI AI Analytics Toolkit installation.
note:
Before install AI Analytics Toolkit,make sure intel oneAPI Base Toolkit installed. You can follow the instructions in https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html to install Intel oneAPI AI Analytics Toolkit on the management node according to your own needs.
Run the following commands to initialize modulefiles for LiCO on the management node:
x# where is the intel oneapi installed, /opt/intel/oneapi is the default pathONEAPI_PATH="/opt/intel/oneapi"
source $ONEAPI_PATH/setvars.shbash $ONEAPI_PATH/modulefiles-setup.shPlease create a shared directory based on the upper directory of the oneapi installation path (task /opt/intel as an example):
Step 1: Manage node sharing /opt/intel for Intel oneAPI:
xxxxxxxxxxecho "/opt/intel *(rw,no_subtree_check,no_root_squash)" >> /etc/exportsexportfs -aStep 2: Configure the shared directory:
xxxxxxxxxx# IP address of management node in the compute intranetMANAGER_NODE_IP="192.168.0.1"
nodeshell all "echo '${MANAGER_NODE_IP}:/opt/intel /opt/intel nfs nfsvers=4.0,nodev,noatime \0 0' >> /etc/fstab"Step 3: Mount shared directory
xxxxxxxxxxnodeshell all mkdir -p /opt/intelnodeshell all mount /opt/intelStep 4: change write permission for socwatch
xxxxxxxxxxcd /opt/intel/oneapi/vtune/latest/socwatchchmod 777 x64For Redhat, Rocky, CentOS Linux:
Add the LiCO module path on the login and compute nodes:
xxxxxxxxxx# where is the intel oneapi installed, /opt/intel/oneapi is the default pathONEAPI_PATH="/opt/intel/oneapi"
nodeshell all "sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/profile.d/lmod.sh"
nodeshell all "sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/profile.d/lmod.csh"
nodeshell all "source /etc/profile.d/lmod.sh"Run the following commands on the management node:
xxxxxxxxxx# where is the intel oneapi installed, /opt/intel/oneapi is the default pathONEAPI_PATH="/opt/intel/oneapi"
sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/profile.d/lmod.sh
sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/profile.d/lmod.csh
source /etc/profile.d/lmod.sh
sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/lico/lico.ini.d/template.ini
lico lmod_syncFor Ubuntu Linux:
Add the LiCO module path on the login and compute nodes:
xxxxxxxxxx
ONEAPI_PATH="/opt/intel/oneapi"
nodeshell all "echo "$ONEAPI_PATH/modulefiles" >> /etc/lmod/modulespath"
nodeshell all "source /etc/profile.d/lmod.sh"
Sync module to LiCO database in management node:
xxxxxxxxxx
ONEAPI_PATH="/opt/intel/oneapi"
echo "$ONEAPI_PATH/modulefiles" >> /etc/lmod/modulespath
source /etc/profile.d/lmod.sh
sed -i s#/opt/ohpc/pub/modulefiles#/opt/ohpc/pub/modulefiles:$ONEAPI_PATH/modulefiles#g /etc/lico/lico.ini.d/template.ini
lico lmod_sync
xxxxxxxxxxvi /etc/lico/lico.ini.d/oneapi.ini
INTEL_MODULE_PATH = "<oneAPI install dir>"ENABLE = truexxxxxxxxxxlico initxxxxxxxxxx# add two kernel parameters in /etc/sysctl.conf on the compute nodes
nodeshell compute "echo 'kernel.kptr_restrict=0' >> /etc/sysctl.conf"nodeshell compute "echo 'kernel.perf_event_paranoid=0' >> /etc/sysctl.conf"nodeshell compute sysctl -p /etc/sysctl.confxxxxxxxxxx
vim /opt/intel/oneapi/vtune/latest/backend/config.yml
# change the type from passphrase to reverse-proxy
#type:passphrase type: reverse-proxyheader: Authorization
vim /opt/intel/oneapi/vtune/latest/backend/server.js#navigate to the function:sendIndexHtml #replace urls.public().href to config.urlPathPrefix in the sendIndexHtml(res) function as follows
function sendIndexHtml(res) { const indexHtmlPath = path.join(__dirname, '../frontend/index.html');
if (config.urlPathPrefix || config.baseUrl) { fs.readFile(indexHtmlPath, (err, data) => { if (err) { res.status(500); res.end(); } const content = data.toString(); res.send(content.replace('<base href="/">', `<base href="${config.urlPathPrefix}">`)); }); } else { res.sendFile(indexHtmlPath); }}
xxxxxxxxxx# where is the intel oneapi installed, /opt/intel/oneapi is the default pathONEAPI_PATH="/opt/intel/oneapi"
nodeshell compute "cd ${ONEAPI_PATH}/vtune/latest/sepdk/src && sudo ./rmmod-sep"nodeshell -c 1 compute "cd ${ONEAPI_PATH}/vtune/latest/sepdk/src && sudo ./build-driver -ni"
# The following errors may be reported:# c3: ERROR: kernel source directory "/usr/src/linux-4.18.0-305.3.1.el8.x86_64" either does not existor not a valid kernel source directory.# c3:# c3: Please use the following command to install kernel header on CentOS:# c3: yum install kernel-devel-4.18.0-305.3.1.el8.x86_64## If an error occurs, install the corresponding program on the corresponding node as prompted:# eg: nodeshell c3 "sudo yum install -y kernel-devel-4.18.0-305.3.1.el8.x86_64"
nodeshell compute "cd ${ONEAPI_PATH}/vtune/latest/sepdk/src && sudo ./boot-script -i -g vtune -p 666"nodeshell compute "sed -i 's#^After.*#& network.target\nRequiresMountsFor=${ONEAPI_PATH}#g' /usr/lib/systemd/system/sep5.service"nodeshell compute systemctl start sep5.servicenodeshell compute systemctl daemon-reloadnodeshell compute systemctl enable sep5.serviceFor Redhat,CentOS,Rocky Linux:
xxxxxxxxxxnodeshell compute "dnf install -y dstat"For Ubuntu Linux:
xxxxxxxxxxnodeshell compute "apt-get install dstat pcp"
Run the following commands on the management node:
For Redhat, CentOS, Rockey Linux:
xxxxxxxxxxdnf install sqliteFor Ubuntu Linux:
xxxxxxxxxxapt-get install sqlite
xxxxxxxxxx# add two kernel parameters in /etc/sysctl.conf on the compute nodes to confirm the core pathnodeshell compute "echo 'kernel.core_pattern = ./core-%e-%p-%s-%h-%t' >> /etc/sysctl.conf"nodeshell compute "echo 'kernel.core_uses_pid = 0' >> /etc/sysctl.conf"nodeshell compute sysctl -p /etc/sysctl.conf
For Ubuntu Linux:
For allow gdb attach to running process, the following commands need to be executed additionally.
xxxxxxxxxx# change the value of kernel.yama.ptrace_scope for GDB nodeshell compute "sed -i s#kernel.yama.ptrace_scope = 1#kernel.yama.ptrace_scope = 0#g /etc/sysctl.d/10-ptrace.conf"nodeshell compute sysctl -p /etc/sysctl.d/10-ptrace.confRun the following commands on the management node to check if the installation is successful:
xxxxxxxxxx# The output may be /opt/intel/oneapi/intelpython/latest/bin/mpirunwhich mpirun
# The output may be /opt/intel/oneapi/intelpython/latest/bin/mpitunewhich mpitune
# The output may be /opt/intel/oneapi/mpi/2021.1.1/bin/mpiiccwhich mpiiccRun the following commands on the management node to check whether the module is successfully configured:
xxxxxxxxxx# The output contains /opt/intel/oneapi/modulefiles informationmodule avaRun the following commands on the management node to check whether the intel python is installed.
xxxxxxxxxx[root@head ~]# source /opt/intel/oneapi/setvars.sh[root@head ~]# conda env list# conda environments:#base * /opt/intel/oneapi/intelpython/latest2022.1.0 /opt/intel/oneapi/intelpython/latest/envs/2022.1.0modin /opt/intel/oneapi/intelpython/latest/envs/modinmodin-0.13.3 /opt/intel/oneapi/intelpython/latest/envs/modin-0.13.3pytorch /opt/intel/oneapi/intelpython/latest/envs/pytorchpytorch-1.10.0 /opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0tensorflow /opt/intel/oneapi/intelpython/latest/envs/tensorflowtensorflow-2.8.0 /opt/intel/oneapi/intelpython/latest/envs/tensorflow-2.8.0
Run the following commands on the management node to check whether the Intel driver is installed.
xxxxxxxxxx# The output might be the following:# c1: sep5 2793472 0# c1: socperf3 602112 1 sep5# c2: sep5 2793472 0# c2: socperf3 602112 1 sep5nodeshell compute "lsmod |grep sep"1.For slurm scheduler, to allow non-root to login compute node, make sure configure user white list firstly
If the message "Cannot get the driver.Please check it" is displayed when you run the Platform Analysis of the administrator page,you can Run the following command to view the background logs.
xxxxxxxxxxtail -f /var/log/lico/lico-core-django.logIf the following error occurs in the log,you need to add the current user to the slurm whitelist.
xxxxxxxxxxAccess denied: user <user> has no active jobs on this node.Access denied by pam_slurm_adopt: you have no active jobs on this nodestep1: Run the following commands on the management node, and share them with the compute node.
xxxxxxxxxx# Edit the slurm.conf filevi /etc/slurm/slurm.conf# Add the following configuration itemsPrologFlags=containstep2: Run the following commands on the compute nodes:
xxxxxxxxxx# Edit the sshd filevi /etc/pam.d/sshd# Add the following configuration items and the order cannot be changedaccount sufficient pam_listfile.so item=user sense=allow onerr=fail file=/etc/ssh/allowed_usersaccount required pam_slurm_adopt.so# Create or modify the '/etc/ssh/allowed_users' filevi /etc/ssh/allowed_users# Add users according to the following example formatmyuser1myuser2step3: Restart the following services
xxxxxxxxxx# On the management nodesystemctl restart slurmctld# On the compute nodesystemctl restart slurmdsystemctl restart sshd
2.Ftrace issue
On the Linux Ftrace subsystem, located in the debugfs partition in /sys/kernel/debug/tracing, may be accessible for the root user only. In this case, the VTune Profiler provides an error message: Ftrace collection is not possible due to a lack of credentials. Root privileges are required.
xxxxxxxxxxvtune: Error: Unable to analyze interrupts. Ftrace is not available. For more information, see the Linux* and Android* Kernel Analysis User Guide.vtune: Error: notErrorOrWarningvtune: Error: Ftrace collection is not possible due to a lack of credentials. Make sure you have read/write access to debugFS. You may either run the analysis with root privileges (recommended) or follow the configuration instructions provided in the Linux and Android Kernel Analysis help topic.
To enable Ftrace events collection on such a system, you may change permissions manually by using the chown command under the root account, for example:
xxxxxxxxxx# nodeshell compute chown -R <user>:vtune /sys/kernel/debug/tracing
Or you can automate change the permissions by using VTune scripts:
xxxxxxxxxx# nodeshell compute /opt/intel/oneapi/vtune/latest/bin64/prepare-debugfs.sh --user <user>
Note: Each compute node should change permissions of /sys/kernel/debug/tracing.