Configuring Service Account
$ sudo lico-passwd-tool
Please enter the postgres username:
Please enter the postgres password:
Please confirm the postgres password:
Please enter the influxdb username:
Please enter the influxdb password:
Please confirm the influxdb password:
Please enter the confluent username:
Please enter the confluent password:
Please confirm the confluent password:
Configuring Cluster Nodes
cp /etc/lico/nodes.csv.example /etc/lico/nodes.csv
We recommend downloading this file to the local computer and edit using Excel or other table editing software. After you’re finished, you can upload it to the management node and overwrite the original file.The cluster information file is comprised of the following six parts.
Room Information
room |
name |
location_description |
---|---|---|
ShangHai Solution Room |
Shanghai Zhangjiang |
Enter only one piece of server room information in the fields below:
name : Room Name
location_description : Room Description
Logic Group Information
Managers can use logic groups to divide the nodes in the cluster into groups. The logic groups do not impact the use of computer resources or permissions configurations.
Logic Group Information Table:
group |
name |
---|---|
login |
Enter at least one logic group in the fields below:
name : Logic Group Name
Room Row Information
Room row is the rack order in the room, and you need to enter information for the rack row in which the cluster node is located.
Row Information Table:
row |
name |
index |
belonging_room |
---|---|---|---|
row1 |
1 |
ShangHai Solution Room |
Enter at least one piece of row information in the fields below:
name : Row Name (Cannot be repeated in the same room)
index : Row Order (Must be a positive integer and cannot be repeated in the same room)
belonging_room : Room Location (Add the configuration name to Room Information)
Rack Information
Input rack information for the cluster node location. The rack information table is below:
rack |
name |
index |
belonging_row |
---|---|---|---|
rack |
1 |
row1 |
Enter at least the information of one rack in the fields below:
name : Rack Name (Cannot be repeated in the same room)
column : Rack Location Column (Must be a positive integer and cannot be repeated in the same line)
belonging_row : Rack Location Row Name (Add the configuration name to the row information table)
Chassis Information
If there is a Chassis in the cluster, enter the chassis information. The chassis information table is below:
chassis |
name |
belonging_rack |
location_u_in_rack |
machine_type |
---|---|---|---|---|
chassis1 |
rack1 |
7 |
d2 |
Fields can be found below:
name : Chassis Name (Cannot be repeated in the same room)
belonging_rack : Rack Location Name (Use the name of the configuration Rack Information.)
location_u_in_rack : The location of the chassis base in the rack (Unit: u). In a standard cabinet, the value should be between 1 and 42.
machine_type : Chassis Type (Can use model number. See appendix of Chassis Model List).
Node Information
Enter information for all nodes in the cluster into the node information table. The node information table can be found below:
node |
name |
nodetype |
immip |
hostip |
machine_type |
ipmi_user |
ipmi_pwd |
belonging_service_node |
belonging_rack |
belonging_chassis |
location_u |
width |
height |
groups |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
head |
head |
10.240.212.13 |
127.0.0.1 |
sr650 |
USERID |
<PASSWORD> |
rack1 |
2 |
1 |
1 |
head |
Fields can be found below:
name : The node hostname does not need a domain name.
nodetype : For node type, choose: 1) head :Management node 2) login : Login node 3) compute:Compute node
immip : IP address of the node’s BMC system.
hostip : IP address of the node on the host network.
machine_type : Machine model for the node. (For available machine model, see appendix Product List
ipmi_user : XCC (BMC) Account for the Node
ipmi_pwd : XCC (BMC) Password for the Node
belonging_service_node : Large clusters require setting up a service node to which the node belongs. If there is no service node, leave the field blank.
belonging_rack : Node Location Rack Name (Add the configuration name to the rack information table)
belonging_chassis : Node Location Chassis Name (Leave blank if it can be located in any chassis.) Configure the chassis name in the chassis information table.
location_u : Node Location: If the node is located in the chassis, enter the slot in the chassis in which the node is located;If the node is located in a rack, enter the location of the node base in the rack (Unit: u).
width : Node Width (Full: 1, Half: 0.5)
height : Node Height (Unit: u)
groups : Node Location Logic Group Name (A node can belong to multiple logic groups. Group names should be separated by “;”.) Configure the logic group name in the logic group information table.
Configuring LiCO Services
The LiCO service configuration file is located in /etc/lico/lico.ini
.This configuration file controls the operating parameters for various LiCO background service components. Modify based on your needs and with reference to the instructions below. If you change the configuration while LiCO is running, restart LiCO for the configuration to take effect.
$ sudo systemctl restart lico
Attention
All matters not raised in the configuration instructions below can be modified after consulting with service staff. Modifications made without a service consultation could result in the system failing to run normally.
Infrastructure Configuration
# Cluster domain settings
domain = hpc.com
Database Configuration
# PostgreSQL address
db_host = 127.0.0.1
# PostgreSQL port
db_port = 5432
# PostgreSQL database name
db_name = lico
# InfluxDB address
influx_host = 127.0.0.1
# InfluxDB port
influx_port = 8086
# InfluxDB database name
influx_database = lico
Login Configuration
# Maximum number of login password error attempts
login_fail_max_chance = 3
Storage Configuration
# Shared storage directory
# If strictly adhering to the shared directory configurations in this document, change
# to: share_dir = /home
share_dir = /home
Scheduler Configuration
# The scheduler configuration currently supports Slurm, LSF, and Torque. Slurm is the default.
scheduler_software = slurm
Alert Configuration
# WeChat proxy server address
wechat_agent_url = http://127.0.0.1:18090
# WeChat notification template ID
wechat_template_id = <WECHAT_TEMPLATE_ID>
# SMS proxy server address
sms_agent_url = http://127.0.0.1:18092
# Email proxy server address
mail_agent_url = http://127.0.0.1:18091
Note
The above only needs to be configured if WeChat, SMS, and email proxy modules are installed for the cluster,Please obtain the <WECHAT_TEMPLATE_ID> from the following website: https://mp.weixin.qq.com/wiki?t=resource/res_main&id=mp1445241432
Cluster Configuration
# Confluent port
confluent_port = 4005
Functional Configuration
[app:django]
# For the functional module used, modify based on the actual module purchased.
# If only using the HPC module, change to: use = hpc
# If only using the AI module, change to: use = ai
# After changing the configuration, you must enter lico init and refresh the data table.
use = hpc+ai
Configuring LiCO Components
lico-vnc-mond
[vnc]
# Modify IP to management node address
url=http://127.0.0.1:18083/session
timeout=30
Attention
Change 127.0.0.1 to the actual management node’s IP.
$ sudo xdcp compute /var/tmp/vnc-mond.ini /etc/lico/vnc-mond.ini
lico-env
This module mainly realizes the following functions:
After the user freezes, the user cannot use ssh to log in to the machine.
After the user freezes, the command su cannot be used to switch to the user.
$ sudo psh compute 'echo "auth required pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/sshd'
$ sudo psh compute 'echo "account required pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/sshd'
$ sudo psh compute 'echo "auth required pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/su'
$ sudo psh compute 'echo "account required pam_python.so pam_lico.py --url=http://${sms_name}:18080 --timeout=40 --ignore_conn_error" >> /etc/pam.d/su'
$ sudo xdcp all /etc/pam.d/sshd /etc/pam.d/sshd
$ sudo xdcp all /etc/pam.d/su /etc/pam.d/su
lico-portal
Modify the pathway files below for nodes installed with the lico-portal module that need to provide external web services.
Edit file
/etc/nginx/nginx.conf
and change the port to 8080:listen 8080 default_server; listen [::]:8080 default_server;
server { listen 8080; server_name localhost; …… }
In addition, you need to modify https default port 443 to other ports, please modify it in the file
/etc/nginx/conf.d/https.conf
listen <port> ssl http2;
Note
make sure the port is not used by other application and it’s not blocked by the firewal.
Modify file
/etc/nginx/conf.d/sites-available/antilles.conf
set $lico_host 127.0.0.1;
Attention
Note that the IP address of the management node (not domain name) is changed according to your actual environment.
Edit file
/etc/lico/portal.conf
Edit this file can add a custom shortcut links, the configuration format can refer to the file:
/etc/lico/portal.conf.example
If you need to hide server version information
http{ ...... sendfile on; server_tokens off; ...... }
lico-ganglia-mond
influxdb {
cfg_db_host 127.0.0.1
cfg_db_port 5432
cfg_db_name lico
host 127.0.0.1
port 8086
database lico
timeout 10
}
Attention
modify
cfg_db_host 127.0.0.1
andcfg_db_port 5432
to the actual PostgreSQL service.modify
host 127.0.0.1
andport 8086
to the actual InfluxDB service.
lico-confluent-proxy
Edit file /etc/lico/confluent-mond.ini
[DEFAULT]
# database
db_host = 127.0.0.1
db_port = 5432
db_name = lico
Note
Modify db_host = 127.0.0.1 and db_port = 5432 to the actual PostgreSQL service.
:emphasize-lines: 2
[app:main]
use = cluster-confluent-proxy
If you need to change information about the confluent user, refer to Installing confluent, create or change the user information, and update the information according to the steps shown in Configuring Service Account
lico-confluent-mond
[database]
db_host = 127.0.0.1
db_port = 5432
db_name = lico
[influxdb]
host = 127.0.0.1
port = 8086
database = lico
timeout = 10
Attention
db_host = 127.0.0.1
and db_port = 5432
to the actual PostgreSQL service.host = 127.0.0.1
and port = 8086
to the actual InfluxDB servicelico-wechat-agent
appid = <APPID>
secret = <SECRET>
Note
Get appid and secret references: WeChat-public-platform