Configuring Service Account

In the management node, use the tool lico-passwd-tool. Follow the prompts to enter your set of PostgreSQL, influxDB, Confluent user and password to complete the configuration.
  $ sudo lico-passwd-tool
  Please enter the postgres username:
  Please enter the postgres password:
  Please confirm the postgres password:

  Please enter the influxdb username:
  Please enter the influxdb password:
  Please confirm the influxdb password:

  Please enter the confluent username:
  Please enter the confluent password:
  Please confirm the confluent password:

Configuring Cluster Nodes

Before using LiCO, follow these steps to import cluster information to the system.
  cp /etc/lico/nodes.csv.example /etc/lico/nodes.csv

We recommend downloading this file to the local computer and edit using Excel or other table editing software. After you’re finished, you can upload it to the management node and overwrite the original file.The cluster information file is comprised of the following six parts.

Room Information

room

name

location_description

ShangHai Solution Room

Shanghai Zhangjiang

Enter only one piece of server room information in the fields below:

  1. name : Room Name

  2. location_description : Room Description

Logic Group Information

Managers can use logic groups to divide the nodes in the cluster into groups. The logic groups do not impact the use of computer resources or permissions configurations.

Logic Group Information Table:

group

name

login

Enter at least one logic group in the fields below:

  1. name : Logic Group Name

Room Row Information

Room row is the rack order in the room, and you need to enter information for the rack row in which the cluster node is located.

Row Information Table:

row

name

index

belonging_room

row1

1

ShangHai Solution Room

Enter at least one piece of row information in the fields below:

  1. name : Row Name (Cannot be repeated in the same room)

  2. index : Row Order (Must be a positive integer and cannot be repeated in the same room)

  3. belonging_room : Room Location (Add the configuration name to Room Information)

Rack Information

Input rack information for the cluster node location. The rack information table is below:

rack

name

index

belonging_row

rack

1

row1

Enter at least the information of one rack in the fields below:

  1. name : Rack Name (Cannot be repeated in the same room)

  2. column : Rack Location Column (Must be a positive integer and cannot be repeated in the same line)

  3. belonging_row : Rack Location Row Name (Add the configuration name to the row information table)

Chassis Information

If there is a Chassis in the cluster, enter the chassis information. The chassis information table is below:

chassis

name

belonging_rack

location_u_in_rack

machine_type

chassis1

rack1

7

d2

Fields can be found below:

  1. name : Chassis Name (Cannot be repeated in the same room)

  2. belonging_rack : Rack Location Name (Use the name of the configuration Rack Information.)

  3. location_u_in_rack : The location of the chassis base in the rack (Unit: u). In a standard cabinet, the value should be between 1 and 42.

  4. machine_type : Chassis Type (Can use model number. See appendix of Chassis Model List).

Node Information

Enter information for all nodes in the cluster into the node information table. The node information table can be found below:

node

name

nodetype

immip

hostip

machine_type

ipmi_user

ipmi_pwd

belonging_service_node

belonging_rack

belonging_chassis

location_u

width

height

groups

head

head

10.240.212.13

127.0.0.1

sr650

USERID

<PASSWORD>

rack1

2

1

1

head

Fields can be found below:

  1. name : The node hostname does not need a domain name.

  2. nodetype : For node type, choose: 1) head :Management node 2) login : Login node 3) compute:Compute node

  3. immip : IP address of the node’s BMC system.

  4. hostip : IP address of the node on the host network.

  5. machine_type : Machine model for the node. (For available machine model, see appendix Product List

  6. ipmi_user : XCC (BMC) Account for the Node

  7. ipmi_pwd : XCC (BMC) Password for the Node

  8. belonging_service_node : Large clusters require setting up a service node to which the node belongs. If there is no service node, leave the field blank.

  9. belonging_rack : Node Location Rack Name (Add the configuration name to the rack information table)

  10. belonging_chassis : Node Location Chassis Name (Leave blank if it can be located in any chassis.) Configure the chassis name in the chassis information table.

  11. location_u : Node Location: If the node is located in the chassis, enter the slot in the chassis in which the node is located;If the node is located in a rack, enter the location of the node base in the rack (Unit: u).

  12. width : Node Width (Full: 1, Half: 0.5)

  13. height : Node Height (Unit: u)

  14. groups : Node Location Logic Group Name (A node can belong to multiple logic groups. Group names should be separated by “;”.) Configure the logic group name in the logic group information table.

Configuring LiCO Services

The LiCO service configuration file is located in /etc/lico/lico.ini.This configuration file controls the operating parameters for various LiCO background service components. Modify based on your needs and with reference to the instructions below. If you change the configuration while LiCO is running, restart LiCO for the configuration to take effect.

$ sudo systemctl restart lico

Attention

All matters not raised in the configuration instructions below can be modified after consulting with service staff. Modifications made without a service consultation could result in the system failing to run normally.

Infrastructure Configuration

# Cluster domain settings
domain = hpc.com

Database Configuration

# PostgreSQL address
db_host = 127.0.0.1
# PostgreSQL port
db_port = 5432
# PostgreSQL database name
db_name = lico

# InfluxDB address
influx_host = 127.0.0.1
# InfluxDB port
influx_port = 8086
# InfluxDB database name
influx_database = lico

Login Configuration

# Maximum number of login password error attempts
login_fail_max_chance = 3

Storage Configuration

# Shared storage directory
# If strictly adhering to the shared directory configurations in this document, change
# to: share_dir = /home
share_dir = /home

Scheduler Configuration

# The scheduler configuration currently supports Slurm, LSF, and Torque. Slurm is the default.
scheduler_software = slurm

Alert Configuration

# WeChat proxy server address
wechat_agent_url = http://127.0.0.1:18090

# WeChat notification template ID
wechat_template_id = <WECHAT_TEMPLATE_ID>

# SMS proxy server address
sms_agent_url = http://127.0.0.1:18092

# Email proxy server address
mail_agent_url = http://127.0.0.1:18091

Note

The above only needs to be configured if WeChat, SMS, and email proxy modules are installed for the cluster,Please obtain the <WECHAT_TEMPLATE_ID> from the following website: https://mp.weixin.qq.com/wiki?t=resource/res_main&id=mp1445241432

Cluster Configuration

# Confluent port
confluent_port = 4005

Functional Configuration

[app:django]
# For the functional module used, modify based on the actual module purchased.
# If only using the HPC module, change to: use = hpc
# If only using the AI module, change to: use = ai
# After changing the configuration, you must enter lico init and refresh the data table.

use = hpc+ai

Configuring LiCO Components

lico-vnc-mond

Create file /var/tmp/vnc-mond.ini and add
[vnc]

# Modify IP to management node address
url=http://127.0.0.1:18083/session
timeout=30

Attention

Change 127.0.0.1 to the actual management node’s IP.

Distribute configuration
$ sudo xdcp compute /var/tmp/vnc-mond.ini /etc/lico/vnc-mond.ini

lico-env

This module mainly realizes the following functions:

  1. After the user freezes, the user cannot use ssh to log in to the machine.

  2. After the user freezes, the command su cannot be used to switch to the user.

Configure ssh commands, run the following commands:
$ sudo psh compute 'echo "auth     required  pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/sshd'
$ sudo psh compute 'echo "account  required  pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/sshd'
Configure su commands, run the following commands:
$ sudo psh compute 'echo "auth     required  pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/su'
$ sudo psh compute 'echo "account  required  pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/su'
Distribute configuration
$ sudo xdcp all /etc/pam.d/sshd /etc/pam.d/sshd
$ sudo xdcp all /etc/pam.d/su /etc/pam.d/su

lico-portal

Modify the pathway files below for nodes installed with the lico-portal module that need to provide external web services.

  • Edit file /etc/nginx/nginx.conf and change the port to 8080:

    listen       8080 default_server;
    listen       [::]:8080 default_server;
    
    server {
      listen       8080;
      server_name  localhost;
      ……
    }
    
  • In addition, you need to modify https default port 443 to other ports, please modify it in the file /etc/nginx/conf.d/https.conf

      listen          <port> ssl http2;
    

Note

make sure the port is not used by other application and it’s not blocked by the firewal.

  • Modify file /etc/nginx/conf.d/sites-available/antilles.conf

    set $lico_host 127.0.0.1;
    

Attention

Note that the IP address of the management node (not domain name) is changed according to your actual environment.

  • Edit file /etc/lico/portal.conf

    Edit this file can add a custom shortcut links, the configuration format can refer to the file: /etc/lico/portal.conf.example

  • If you need to hide server version information

    Edit file /etc/nginx/nginx.conf and add server_tokens off; in the http area
    http{
          ......
          sendfile         on;
          server_tokens    off;
          ......
    }
    

lico-ganglia-mond

Edit file /etc/lico/ganglia-mond.conf
influxdb {
    cfg_db_host 127.0.0.1
    cfg_db_port 5432
    cfg_db_name lico
    host 127.0.0.1
    port 8086
    database lico
    timeout 10

}

Attention

  1. modify cfg_db_host 127.0.0.1 and cfg_db_port 5432 to the actual PostgreSQL service.

  2. modify host 127.0.0.1 and port 8086 to the actual InfluxDB service.

lico-confluent-proxy

Edit file /etc/lico/confluent-mond.ini

[DEFAULT]
# database
db_host = 127.0.0.1
db_port = 5432
db_name = lico

Note

Modify db_host = 127.0.0.1 and db_port = 5432 to the actual PostgreSQL service.

If there are multiple confluent in the cluster, you need to configure it in cluster mode by edit the file /etc/lico/confluent-proxy.ini, like this:
  :emphasize-lines: 2

  [app:main]
  use = cluster-confluent-proxy

If you need to change information about the confluent user, refer to Installing confluent, create or change the user information, and update the information according to the steps shown in Configuring Service Account

lico-confluent-mond

Edit file /etc/lico/confluent-mond.ini
[database]
db_host = 127.0.0.1
db_port = 5432
db_name = lico

[influxdb]
host = 127.0.0.1
port = 8086
database = lico
timeout = 10

Attention

Change db_host = 127.0.0.1 and db_port = 5432 to the actual PostgreSQL service.
Change host = 127.0.0.1 and port = 8086 to the actual InfluxDB service
If you follow this document, they should be installed on management node with default port.

lico-wechat-agent

Edit file /etc/lico/wechat-agent.ini
appid = <APPID>
secret = <SECRET>

Note

Get appid and secret references: WeChat-public-platform