An HPC cluster is a multi-user operating environment, which means that, under normal circumstances, all users can allocate resources via the login node.
However, since the login node is not under the control of the workload manager, some unwarranted behaviors could use large amount of CPU, bandwidth, or memory resources and impede usages of the HPC cluster from other users. These behaviors include long time usage of multi-core compiling and running MPI software on the login node.
In this article, we will introduce some of the features of the Linux operating system and help HPC cluster managers to understand how to limit user resources on the login node.
Assume that we have a 20-core node and 20 users (a total of 2000% CPU resources), and we want to limit each user’s CPU usage to be less than 100%.
/usr/lib/systemd/system/user-.slice.d/
, add the config file 11-limit-cpu.conf
[Slice]
CPUQuota=100%
systemd daemon-reload
11-limit-cpu.conf
is indeed loadedsystemdl status user-0.slice
user-0.slice - User Slice of UID 0
Loaded: loaded
Drop-In: /usr/lib/systemd/system/user-.slice.d
└─10-defaults.conf, 11-limit-cpu.conf
Active: active since Thu 2023-03-16 15:09:49 CST; 49min ago
Tasks: 17 (limit: 23634)
Memory: 156.7M
CGroup: /user.slice/user-0.slice
├─session-1.scope
│ ├─1310 sshd: root [priv]
│ ├─1580 sshd: root@pts/0,pts/1,pts/5,pts/4
│ ├─1581 -bash
│ ├─2149 -bash
│ ├─2458 ssh hpcadmin@localhost
│ ├─2877 -bash
│ ├─3012 man systemd.resource-control
│ ├─3024 less
│ ├─3175 -bash
│ ├─3304 su - hpcadmin
│ ├─3305 -bash
│ ├─3466 man systemd.resource-control
│ ├─3478 less
│ ├─3723 systemctl status user-0.slice
│ └─3724 less
└─user@0.service
└─init.scope
├─1564 /usr/lib/systemd/systemd --user
└─1570 (sd-pam)
We can see that 11-limit-cpu.conf
is indeed loaded
Assume that
hpcadmin
hasuid
1000
, and we want to limithpcadmin
’sCPU
resources to400%
/usr/lib/systemd/system/user-1000.slice.d/
, add the config file limit-cpu.conf.conf
[Slice]
CPUQuota=400%
systemd daemon-reload
11-limit-cpu.conf
is indeed loadedsystemdl status user-1000.slice
user-1000.slice - User Slice of UID 1000
Loaded: loaded
Drop-In: /usr/lib/systemd/system/user-.slice.d
└─10-defaults.conf, 11-limit-cpu.conf
/etc/systemd/system/user-1000.slice.d
└─limit-cpu.conf
Active: active since Thu 2023-03-16 16:32:11 CST; 5min ago
Tasks: 5 (limit: 23634)
Memory: 16.6M
CGroup: /user.slice/user-1000.slice
├─session-3.scope
│ ├─2085 sshd: hpcadmin [priv]
│ ├─2105 sshd: hpcadmin@pts/2
│ └─2106 -bash
└─user@1000.service
└─init.scope
├─2095 /usr/lib/systemd/systemd --user
└─2097 (sd-pam)
For more information, access the
man
page byman systemd.resource-control
CPUQuota
will limit the time usage on specific processes, with unit %
. This configuration would not limit the number of cores used by the user, only the time.MemoryHigh
sets the high limit of memory usage of specific processes. The high limit means that, if inevitably, this limit is surpassed, the workload manager will significantly slow the process and actively takes away the processes’ memory. The unit of the configuration can be K
, M
, G
, and T
.MemoryMax
sets the absolute upper limit of memory usage of specific processes. If the limit is surpasses, the workload manager will terminate the processes. It is recommended to set MemoryHigh
in combination of MemoryMax
. The unit of the configuration can be K
, M
, G
, and T
.