Initialize Hybird HPC

Initialize Hybird HPC

Attention:This function only supports LiCO clusters whose operating system is RedHat 9.4

  1. Install Hybrid HPC–Azure. Do one of the following:

    # config EPEL repo
    dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
    # install
    dnf install -y lico-core-cloudscheduling-azure
    # config EPEL repo
    dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
    # install dependency packages
    dnf install -y openvpn easy-rsa sshpass
  2. Modify the configuration file /etc/lico/lico.ini.d/cloudscheduling.ini

  3. The LiCO management node shares /opt/lico/cloud:

    echo "/opt/lico/cloud *(ro,sync,no_subtree_check,no_root_squash)" >> /etc/exports 
    exportfs -a
  4. Modify the LiCO management node slurm configuration file /etc/slurm/slurm.conf, and add the following content at the end of the file:

    include /opt/lico/cloud/azure/slurm.conf
  5. Configure the autoscaling function of Hybrid HPC

  6. Modify the scripts and directory permissions: ```shell chown -R slurm:slurm /opt/lico/pub/slurm/

    chmod 755 /opt/lico/pub/slurm/*.sh ```

  7. Run the following command to restart the slurmctld service on the LiCO management node:

    systemctl restart slurmctld
  8. Create an Azure Authenticator:

  9. Run the following command to import the azure authentication information into LiCO:

    # Import the application (client) ID, directory (tenant) ID and client password obtained in 6. 
    # Follow the prompts and import them in sequence
    lico azure_secret import
  10. Create Public IP address

Troubleshooting: Page Issues After Cloud Node Deployment

If the page displays errors after deploying the cloud nodes, follow these steps to troubleshoot and resolve the issue.

Check Cloud Node Status

In the lico Administrator page, click MonitorList View to check whether the cloud node monitoring information is correct.

Synchronize Node Information

If the cloud node monitoring information in the List View page is incorrect, execute the following command to synchronize cloud node information:

    lico sync_node