安装集群基础软件

基础软件列表

下表中安装节点字段的表示如下:

M

管理节点

L

登录节点

C

计算节点

软件名

组件名称

版本

服务名

安装节点

备注

nfs

nfs-utils

1.3.0

nfs-server

M

el7

nfs-kernel-server

1.3.0

nfs-server

M

sle12

nfs-client

1.3.0

nfs

C,L

sle12

ntp

ntp

4.2.6

ntpd

M

slurm

ohpc-slurm-server

1.3.3

munge,slurmctld

M

ohpc-slurm-client

1.3.3

munge,slurmd

C,L

ganglia

ganglia-gmond-ohpc

3.7.2

gmond

M,C,L

singularity

singularity-ohpc

2.4

M

cuda

cudnn

7

C

仅GPU节点 需要安装

cuda

9.1

C

mpi

openmpi3-gnu7-ohpc

3.0.0

M

至少安装三种 MPI 中的一种

mpich-gnu7-ohpc

3.2

M

mvapich2-gnu7-ohpc

2.2

M

为管理节点设置本地源

  • 下载本地源

  • 配置本地源

    将安装包上传到管理节点,执行如下命令来配置 Lenovo OpenHPC 的本地源:

    $ sudo mkdir -p $ohpc_repo_dir
    $ sudo tar xvf Lenovo-OpenHPC-1.3.3.CentOS_7.x86_64.tar -C $ohpc_repo_dir
    $ sudo $ohpc_repo_dir/make_repo.sh
    
    $ sudo mkdir -p $ohpc_repo_dir
    $ sudo tar xvf Lenovo-OpenHPC-1.3.3.SLES.x86_64.tar -C $ohpc_repo_dir
    $ sudo $ohpc_repo_dir/make_repo.sh
    $ sudo rpm --import $ohpc_repo_dir/SLE_12/repodata/repomd.xml.key
    

为计算及登录节点配置本地源

  • el7

    安装 yum-utils
    $ sudo psh all yum --setopt=\*.skip_if_unavailable=1 -y install yum-utils
    
    添加本地源
    $ sudo cp /etc/yum.repos.d/Lenovo.OpenHPC.local.repo /var/tmp
    $ sudo sed -i '/^baseurl=/d' /var/tmp/Lenovo.OpenHPC.local.repo
    $ sudo sed -i '/^gpgkey=/d' /vars/tmp/Lenovo.OpenHPC.local.repo
    
    $ sudo echo "baseurl=http://${sms_name}/${ohpc_repo_dir}/CentOS_7" >> /var/tmp/Lenovo.OpenHPC.local.repo
    $ sudo echo "gpgkey=http://${sms_name}/${ohpc_repo_dir}/CentOS_7/repodata/repomd.xml.key" >> /var/tmp/Lenovo.OpenHPC.local.repo
    
    # Distribute repo files
    $ sudo xdcp all /var/tmp/Lenovo.OpenHPC.local.repo /etc/yum.repos.d/
    $ sudo psh all echo -e %_excludedocs 1 \>\> ~/.rpmmacros
    

    关闭指向外部网络的 yum

    Note

    此步骤可视实际情况来执行,如果操作系统本身没有安装足够的包,可能会导致后续安装步骤失败

    $ sudo psh all yum-config-manager --disable CentOS\*
    
  • sle12

    $ sudo cp /etc/zypp/repos.d/Lenovo.OpenHPC.local.repo /var/tmp
    $ sudo sed -i '/^baseurl=/d' /var/tmp/Lenovo.OpenHPC.local.repo
    $ sudo sed -i '/^gpgkey=/d' /var/tmp/Lenovo.OpenHPC.local.repo
    
    $ sudo echo "baseurl=http://${sms_name}/${ohpc_repo_dir}/SLE_12" >> /var/tmp/Lenovo.OpenHPC.local.repo
    $ sudo echo "gpgkey=http://${sms_name}/${ohpc_repo_dir}/SLE_12/repodata/repomd.xml.key" >> /var/tmp/Lenovo.OpenHPC.local.repo
    
    # Distribute repo files
    $ sudo xdcp all /var/tmp/Lenovo.OpenHPC.local.repo /etc/zypp/repos.d/
    $ sudo psh all rpm --import http://${sms_name}/${ohpc_repo_dir}/SLE_12/repodata/repomd.xml.key
    $ sudo psh all echo -e %_excludedocs 1 \>\> ~/.rpmmacros
    

配置LiCO依赖源

  • el7

    下载安装包: https://hpc.lenovo.com/lico/downloads/5.1/lico-dep-5.1.el7.x86_64.tgz

    上传安装包到管理节点,执行如下命令来为管理节点配置 yum 源, 请注意管理节点需要配置好操作系统的本地 yum 源,以便进行后续操作:
    $ sudo mkdir -p $lico_dep_repo_dir
    $ sudo tar xvf lico-dep-5.1.el7.x86_64.tgz -C $lico_dep_repo_dir
    $ sudo $lico_dep_repo_dir/mklocalrepo.sh
    
    执行如下命令来为其他节点配置 yum 源:
    $ sudo cp /etc/yum.repos.d/lico-dep.repo /var/tmp
    
    $ sudo sed -i '/^baseurl=/d' /var/tmp/lico-dep.repo
    $ sudo sed -i '/^gpgkey=/d' /var/tmp/lico-dep.repo
    
    $ sudo echo "baseurl=http://${sms_name}/${lico_dep_repo_dir}" >> /var/tmp/lico-dep.repo
    $ sudo echo "gpgkey=http://${sms_name}/${lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-EL7" >> /var/tmp/lico-dep.repo
    
    # Distribution configuration
    $ sudo xdcp all /var/tmp/lico-dep.repo /etc/yum.repos.d
    
  • sle12

    下载安装包: https://hpc.lenovo.com/lico/downloads/5.1/lico-dep-5.1.sle12.x86_64.tgz

    上传安装包到管理节点,执行如下命令来为管理节点配置 zypper 源,请注意管理节点需要配置好操作系统的本地 zypper 源,以便进行后续操作:
    $ sudo mkdir -p $lico_dep_repo_dir
    $ sudo tar xvf lico-dep-5.1.sle12.x86_64.tgz -C $lico_dep_repo_dir
    $ sudo $lico_dep_repo_dir/mklocalrepo.sh
    $ sudo rpm --import $lico_dep_repo_dir/RPM-GPG-KEY-LICO-DEP-SLE12
    
    执行如下命令来为其他节点配置 zypper 源:
    $ sudo cp /etc/zypp/repos.d/lico-dep.repo /var/tmp
    
    $ sudo sed -i '/^baseurl=/d' /var/tmp/lico-dep.repo
    $ sudo sed -i '/^gpgkey=/d' /var/tmp/lico-dep.repo
    
    $ sudo echo "baseurl=http://${sms_name}/${lico_dep_repo_dir}" >> /var/tmp/lico-dep.repo
    $ sudo echo "gpgkey=http://${sms_name}/${lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-SLE12" >> /var/tmp/lico-dep.repo
    
    # Distribution configuration
    $ sudo xdcp all /var/tmp/lico-dep.repo /etc/zypp/repos.d
    $ sudo psh all rpm --import http://${sms_name}/${lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-SLE12
    

安装slurm

  • el7

    安装 ohpc-base
    $ sudo yum -y install lenovo-ohpc-base
    
    安装 slurm 服务端组件
    $ sudo yum -y install ohpc-slurm-server
    
    安装 slurm 客户端组件
    $ sudo psh all yum -y install ohpc-base-compute ohpc-slurm-client lmod-ohpc
    

    配置 pam_slurm

    Note

    该组件可以防止用户提交占位任务来直接登录计算节点,可以跳过该步骤

$ sudo psh all echo "\""account required pam_slurm.so"\"" \>\> /etc/pam.d/sshd
  • sle12

    安装 ohpc-base
    $ sudo zypper install lenovo-ohpc-base
    
    安装 slurm 服务端组件
    $ sudo zypper install ohpc-slurm-server
    
    安装 slurm 客户端组件
    $ sudo psh all zypper install -y --force-resolution ohpc-base-compute ohpc-slurm-client lmod-ohpc
    

    配置 pam_slurm

    Note

    该组件可以防止用户提交占位任务来直接登录计算节点,可以跳过该步骤

    $ sudo psh all echo "\""account required pam_slurm.so"\"" \>\> /etc/pam.d/sshd
    

配置nfs

  • el7

    Note

    执行如下命令来配置集群的共享目录,其中 /opt/ophc/pub 是必须要配置的,如果集群中 /opt/ohpc/pub 已经设置为共享目录,请跳过。

    # Management node share the Lenovo OpenHPC directory
    $ sudo yum -y install nfs-utils
    $ sudo echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
    $ sudo exportfs -a
    
    # Installing NFS for Cluster Nodes
    $ sudo psh all yum -y install nfs-utils
    
    # Configure shared directory for cluster nodes
    $ sudo psh all mkdir -p /opt/ohpc/pub
    $ sudo psh all echo "\""${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0"\"" \>\> /etc/fstab
    
    # Mount shared directory
    $ sudo psh all mount /opt/ohpc/pub
    

    Note

    下面的步骤是创建用户共享目录,本文以 /home 例,你也可以选择其他目录

    # Management node shares /home and Lenovo OpenHPC package directory
    $ sudo echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)" >> /etc/exports
    $ sudo exportfs -a
    
    # if /home already mounted, unmount it first
    $ sudo psh all "sed -i '/ \/home /d' /etc/fstab"
    $ sudo psh all umount /home
    
    # Configure a shared directory for cluster nodes
    $ sudo psh all echo "\""${sms_ip}:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0"\"" \>\> /etc/fstab
    
    # Mount a shared directory
    $ sudo psh all mount /home
    
  • sle12

    Note

    执行如下命令来配置集群的共享目录,其中 /opt/ophc/pub 是必须要配置的,如果集群中 /opt/ohpc/pub 已经设置为共享目录,请跳过。

    # Management node share the Lenovo OpenHPC directory
    $ sudo zypper install nfs-kernel-server
    $ sudo echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
    $ sudo exportfs -a
    
    # Configure shared directory for cluster nodes
    $ sudo psh all zypper install -y --force-resolution nfs-client
    $ sudo psh all mkdir -p /opt/ohpc/pub
    $ sudo psh all echo "\""${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs nfsvers=3,nodev,noatime 0 0"\"" \>\> /etc/fstab
    
    # Mount shared directory
    $ sudo psh all mount /opt/ohpc/pub
    

    Note

    下面的步骤是创建用户共享目录,本文以 /home 例,你也可以选择其他目录

    # Management node shares /home and Lenovo OpenHPC package directory
    $ sudo echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)" >> /etc/exports
    $ sudo exportfs -a
    
    # if /home already mounted, unmount it first
    $ sudo psh all "sed -i '/ \/home /d' /etc/fstab"
    $ sudo psh all umount /home
    
    # Configure a shared directory for cluster nodes
    $ sudo psh all echo "\""${sms_ip}:/home /home nfs nfsvers=3,nodev,nosuid,noatime 0 0"\"" \>\> /etc/fstab
    
    # Mount a shared directory
    $ sudo psh all mount /home
    

配置ntp

Note

如果集群各节点已经配置了 ntp 服务,请跳过此步骤。

    $ sudo echo "server 127.127.1.0" >> /etc/ntp.conf
    $ sudo echo "fudge  127.127.1.0 stratum 10" >> /etc/ntp.conf
    $ sudo systemctl enable ntpd
    $ sudo systemctl start ntpd
    $ sudo psh all yum -y install ntp
    $ sudo psh all echo "\""server ${sms_ip}"\"" \>\> /etc/ntp.conf
    # Startup
    $ sudo ppsh all systemctl enable ntpd
    $ sudo ppsh all systemctl start ntpd

    # check service
    psh all "ntpq -p | tail -n 1"
    $ sudo echo "server 127.127.1.0" >> /etc/ntp.conf
    $ sudo echo "fudge  127.127.1.0 stratum 10" >> /etc/ntp.conf
    $ sudo systemctl enable ntpd
    $ sudo systemctl start ntpd
    $ sudo psh all zypper install -y --force-resolution ntp
    $ sudo psh all echo "\""server ${sms_ip}"\"" \>\> /etc/ntp.conf
    # Startup
    $ sudo psh all systemctl enable ntpd
    $ sudo psh all systemctl start ntpd

    # check service
    psh all "ntpq -p | tail -n 1"

安装cuda和cudnn

Note

仅需在带有 GPU计算节点 上运行,以下命令会在所有GPU计算节点上安装CUDA和CUDNN(如果只有一部分节点具有GPU,则将psh命令中的“compute”参数替换为对应于GPU节点的节点范围)

  • 下载 cuda

    下载 cuda_9.1.85_387.26_linux.run 到共享目录(本文的共享目录配置的是 /home)

    下载地址: https://developer.nvidia.com/cuda-downloads

    如果操作系统是桌面启动的,先运行如下命令来设置从命令行启动,然后重启系统
    $ sudo psh compute systemctl set-default multi-user.target
    $ sudo psh compute reboot
    
  • 安装 nvidia 驱动

    下载地址:

    Note

    我们建议您安装 kernel 的补丁包,以修复一些安全漏洞,您可以下面找到需要下载的补丁包

    redhat

    centos

    suse

    然后确保安装与正在运行的内核匹配的 kernel-devel 软件包。 如果已经完成,那么可以从以下命令中省略 kernel-devel 包。 否则,运行以下命令,如下所示:

    $ sudo psh compute rpm -ivh /home/nvidia-diag-driver-local-repo-rhel7-390.46-1.0-1.x86_64.rpm
    $ sudo psh compute yum install -y cuda-drivers
    
    $ sudo psh compute rpm -ivh /home/nvidia-diag-driver-local-repo-sles123-390.46-1.0-1.x86_64.rpm
    $ sudo psh compute zypper --gpg-auto-import-keys install -y --force-resolution cuda-drivers
    $ psh compute perl -pi -e "s/NVreg_DeviceFileMode=0660/NVreg_DeviceFileMode=0666/" /etc/modprobe.d/50-nvidia-default.conf
    $ psh compute reboot -h now
    
  • 安装 cuda

    $ sudo psh compute yum install -y kernel-devel gcc gcc-c++
    $ sudo psh compute /home/cuda_9.1.85_387.26_linux.run --silent --toolkit --samples --no-opengl-libs --verbose --override
    
    $ sudo psh compute zypper install -y --force-resolution kernel-devel gcc gcc-c++
    $ sudo psh compute /home/cuda_9.1.85_387.26_linux.run --silent --toolkit --samples --no-opengl-libs --verbose --override
    
  • 下载 cudnn

    从官网下载 cudnn-9.1-linux-x64-v7.1.tgz/root 目录下。官网地址:

  • 安装 cudnn

    $ cd ~
    $ tar -xvf cudnn-9.1-linux-x64-v7.1.tgz
    $ sudo xdcp compute cuda/include/cudnn.h /usr/local/cuda/include
    $ sudo xdcp compute cuda/lib64/libcudnn_static.a /usr/local/cuda/lib64
    $ sudo xdcp compute cuda/lib64/libcudnn.so.7.0.5 /usr/local/cuda/lib64
    $ sudo psh compute "ln -s /usr/local/cuda/lib64/libcudnn.so.7.0.5 /usr/local/cuda/lib64/libcudnn.so.7"
    $ sudo psh compute "ln -s /usr/local/cuda/lib64/libcudnn.so.7 /usr/local/cuda/lib64/libcudnn.so"
    $ sudo psh compute chmod a+r /usr/local/cuda/include/cudnn.h
    $ sudo psh compute chmod a+r /usr/local/cuda/lib64/libcudnn*
    
  • 配置环境变量

    为了正确的安装 CUDA 包,需要配置一些环境变量,您可以通过以下命令对涉及的配置文件进行修改。为了方便在包含 GPU 集群的计算节点上部署这些文件,请在管理节点上运行以下命令,即便 CUDA 没有安装管理节点上
    $ sudo echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/cuda.conf
    $ sudo echo "export CUDA_HOME=/usr/local/cuda" >> /etc/profile.d/cuda.sh
    $ sudo echo "export PATH=/usr/local/cuda/bin:\$PATH" >> /etc/profile.d/cuda.sh
    
  • 分发配置

    $ sudo xdcp compute /etc/ld.so.conf.d/cuda.conf /etc/ld.so.conf.d/cuda.conf
    $ sudo xdcp compute /etc/profile.d/cuda.sh /etc/profile.d/cuda.sh
    
  • 运行如下命令,来确认是否能够识别 GPU :

    $ sudo psh compute ldconfig
    $ sudo psh compute nvidia-smi
    $ sudo psh compute "cd /root/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery; make; ./deviceQuery" | xcoll
    
  • 设置Cuda的自启动

    # configuration
    $ sudo psh compute sed -i '/Wants=syslog.target/a\Before=slurmd.service' /usr/lib/systemd/system/nvidia-persistenced.service
    
    $ sudo psh compute systemctl daemon-reload
    $ sudo psh compute systemctl enable nvidia-persistenced
    $ sudo psh compute systemctl start nvidia-persistenced
    
    # add configure file
    $ cat << eof > /usr/lib/systemd/system/nvidia-persistenced.service
    [Unit]
    Description=NVIDIA Persistence Daemon
    Before=slurmd.service
    Wants=syslog.target
    
    [Service]
    Type=forking
    ExecStart=/usr/bin/nvidia-persistenced --user root
    ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
    
    [Install]
    WantedBy=multi-user.target
    eof
    
    # Distribute configure file
    xdcp compute /usr/lib/systemd/system/nvidia-persistenced.service /usr/lib/systemd/system/nvidia-persistenced.service
    
    # restart service
    psh compute systemctl daemon-reload
    psh compute systemctl enable nvidia-persistenced
    psh compute systemctl start nvidia-persistenced
    

配置slurm

  • 配置 slurm

    1. 下载文件 https://hpc.lenovo.com/lico/downloads/5.1/examples/conf/slurm.conf 到管理节点的 /etc/slurm/ 下, 并参考附录根据实际情况进行修改

    2. 下载文件 https://hpc.lenovo.com/lico/downloads/5.1/examples/conf/gres.conf 到管理节点的 /etc/slurm/, 并参考附录根据实际情况进行修改。若节点不是 GPU 节点则不需要该文件。

  • 分发配置

    $ sudo xdcp all /etc/slurm/slurm.conf /etc/slurm/slurm.conf
    $ sudo xdcp all /etc/munge/munge.key /etc/munge/munge.key
    
  • 启动服务

    # Startup Management Node
    $ sudo systemctl enable munge
    $ sudo systemctl enable slurmctld
    $ sudo systemctl restart munge
    $ sudo systemctl restart slurmctld
    
    # Startup Other Node
    $ sudo psh all systemctl enable munge
    $ sudo psh all systemctl restart munge
    $ sudo psh all systemctl enable slurmd
    $ sudo psh all systemctl restart slurmd
    

Note

如果 slurm 运行出现问题,请参考 如何解决slurm常见问题

安装ganglia

  • 安装 gmond

    # Management node
    $ sudo yum -y install ganglia-gmond-ohpc
    
    # Other node
    $ sudo psh all yum install -y ganglia-gmond-ohpc
    
    # Management node
    $ sudo zypper install ganglia-gmond-ohpc
    
    # Other node
    $ sudo psh all zypper install -y --force-resolution ganglia-gmond-ohpc
    
  • 配置 gmond

    1. 下载 https://hpc.lenovo.com/lico/downloads/5.1/examples/conf/ganglia/management/gmond.conf/etc/ganglia/gmond.conf

    2. 下载 https://hpc.lenovo.com/lico/downloads/5.1/examples/conf/ganglia/gmond.conf/var/tmp/gmond.conf

    Note

    请根据实际情况修改 udp_send_channelhost 参数为管理节点的 hostname 地址

  • 修改内核参数

    $ sudo echo net.core.rmem_max=10485760 > /usr/lib/sysctl.d/gmond.conf
    $ sudo /usr/lib/systemd/systemd-sysctl gmond.conf
    $ sudo sysctl -w net.core.rmem_max=10485760
    
  • 发布配置

    $ sudo xdcp all /var/tmp/gmond.conf /etc/ganglia/gmond.conf
    
  • 启动服务

    # Management node
    $ sudo systemctl enable gmond
    $ sudo systemctl start gmond
    
    # Other node
    $ sudo psh all systemctl enable gmond
    $ sudo psh all systemctl start gmond
    
    # Make sure all nodes are listed
    $ sudo gstat -a
    

安装mpi

  • 安装 mpi 模块

    $ sudo yum -y install openmpi3-gnu7-ohpc mpich-gnu7-ohpc mvapich2-gnu7-ohpc
    
    $ sudo zypper install openmpi3-gnu7-ohpc mpich-gnu7-ohpc mvapich2-gnu7-ohpc
    

    Note

    以上命令会在系统中安装 openmpi, mpich , mvapich 三个模块,用户可以使用 lmod 来选择具体使用的 mpi 模块。 openhpc 也提供了模块包来制定默认使用模块。

  • 设置默认 mpi 模块

    el7

    运行以下命令来设置 openmpi 模块为默认模块
    $ sudo yum -y install lmod-defaults-gnu7-openmpi3-ohpc
    
    运行以下命令来设置 mpich 模块为默认模块
    $ sudo yum -y install lmod-defaults-gnu7-mpich-ohpc
    
    运行以下命令来设置 mvapich 模块为默认模块
    $ sudo yum -y install lmod-defaults-gnu7-mvapich2-ohpc
    

    sle12

    运行以下命令来设置 openmpi 模块为默认模块
    $ sudo zypper install lmod-defaults-gnu7-openmpi3-ohpc
    
    运行以下命令来设置 mpich 模块为默认模块
    $ sudo zypper install lmod-defaults-gnu7-mpich-ohpc
    
    运行以下命令来设置 mvapich 模块为默认模块
    $ sudo zypper install lmod-defaults-gnu7-mvapich2-ohpc
    
  • openhpc上每个mpi类型的互连支持表

    Ethernet(TCP)

    InfiniBand

    Omni-Path

    MPICH

    X

    MVAPICH2

    X

    MVAPICH2(psm2)

    X

    OpenMPI

    X

    X

    X

    OpenMPI(PMIx)

    X

    X

    X

Note

注意:如果你想使用 MVAPICH2(psm2),你应该安装 mvapich2-psm2-gnu7-ohpc , 如果你想使用 OpenMPI(PMIx),你应该安装 openmpi3-pmix-slurm-gnu7-ohpc. 但是 openmpi3-gnu7-ohpcopenmpi3-pmix-slurm-gnu7-ohpc 不兼容, mvapich2-psm2-gnu7-ohpcmvapich2-gnu7-ohpc 不兼容。

安装singularity

singluarity 是面向 hpc 领域的轻量级容器框架

  • 安装 singluarity

    $ sudo yum -y install singularity-ohpc
    
    $ sudo zypper install singularity-ohpc
    
  • 安装 openhpc 默认环境

    编辑 /opt/ohpc/pub/modulefiles/ohpc 文件,在对应区块添加如下内容
    # Add in module try-add
    module try-add singularity
    
    # Add in module del
    module del singularity
    
  • 使配置生效

    运行如下命令
    $ sudo source /etc/profile.d/lmod.sh
    

Note

当您安装 lmod-defaults* 的安装包时,默认的配置可能会使 /opt/ohpc/pub/modulefiles/ohpc 文件中的更改丢失。 在这种情况下,重新修改 /opt/ohpc/pub/modulefiles/ohpc 文件,或者在 /etc/profile.d/lmod.sh 的底部添加 module try-add singularity

检查点B

  • 检测 slurm

    $ sudo sinfo
    ...
    PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
    normal*      up 1-00:00:00      2   idle c[1-2]
    ...
    

    Attention

    节点的状态应该是 idle , idle* 是不正常的状态

  • 添加一个测试帐户

    $ sudo useradd -m test
    $ sudo echo "MERGE:" > syncusers
    $ sudo echo "/etc/passwd -> /etc/passwd" >> syncusers
    $ sudo echo "/etc/group -> /etc/group" >> syncusers
    $ sudo echo "/etc/shadow -> /etc/shadow" >> syncusers
    $ sudo xdcp all -F syncusers
    
  • 运行测试 mpi 程序

    $ su - test
    $ mpicc -O3 /opt/ohpc/pub/examples/mpi/hello.c
    $ srun -n 8 -N 1 -w compute --pty /bin/bash
    $ prun ./a.out
    ...
    Master compute host = c1
    Resource manager = slurm
    Launch cmd = mpiexec.hydra -bootstrap slurm ./a.out
    Hello, world (8 procs total)
    --> Process # 0 of 8 is alive. -> c1
    --> Process # 4 of 8 is alive. -> c2
    --> Process # 1 of 8 is alive. -> c1
    --> Process # 5 of 8 is alive. -> c2
    --> Process # 2 of 8 is alive. -> c1
    --> Process # 6 of 8 is alive. -> c2
    --> Process # 3 of 8 is alive. -> c1
    --> Process # 7 of 8 is alive. -> c2
    

Note

测试完成之后,请注意您退出到管理节点的 root 用户