An HPC cluster is a multi-user operating environment, which means that, under normal circumstances, all users can allocate resources via the login node.
However, since the login node is not under the control of the workload manager, some unwarranted behaviors could use large amount of CPU, bandwidth, or memory resources and impede usages of the HPC cluster from other users. These behaviors include long time usage of multi-core compiling and running MPI software on the login node.
In this article, we will introduce some of the features of the Linux operating system and help HPC cluster managers to understand how to limit user resources on the login node.
Assume that we have a 20-core node and 20 users (a total of 2000% CPU resources), and we want to limit each userβs CPU usage to be less than 100%.
/usr/lib/systemd/system/user-.slice.d/
, add the config file
11-limit-cpu.conf
[Slice]
CPUQuota=100%
systemd daemon-reload
11-limit-cpu.conf
is indeed loadedsystemdl status user-0.slice
user-0.slice - User Slice of UID 0
Loaded: loaded
Drop-In: /usr/lib/systemd/system/user-.slice.d
ββ10-defaults.conf, 11-limit-cpu.conf
Active: active since Thu 2023-03-16 15:09:49 CST; 49min ago
Tasks: 17 (limit: 23634)
Memory: 156.7M
CGroup: /user.slice/user-0.slice
ββsession-1.scope
β ββ1310 sshd: root [priv]
β ββ1580 sshd: root@pts/0,pts/1,pts/5,pts/4
β ββ1581 -bash
β ββ2149 -bash
β ββ2458 ssh hpcadmin@localhost
β ββ2877 -bash
β ββ3012 man systemd.resource-control
β ββ3024 less
β ββ3175 -bash
β ββ3304 su - hpcadmin
β ββ3305 -bash
β ββ3466 man systemd.resource-control
β ββ3478 less
β ββ3723 systemctl status user-0.slice
β ββ3724 less
ββuser@0.service
ββinit.scope
ββ1564 /usr/lib/systemd/systemd --user
ββ1570 (sd-pam)
We can see that 11-limit-cpu.conf
is indeed loaded
Assume that
hpcadmin
hasuid
1000
, and we want to limithpcadmin
βsCPU
resources to400%
/usr/lib/systemd/system/user-1000.slice.d/
, add the config
file limit-cpu.conf.conf
[Slice]
CPUQuota=400%
systemd daemon-reload
11-limit-cpu.conf
is indeed loadedsystemdl status user-1000.slice
user-1000.slice - User Slice of UID 1000
Loaded: loaded
Drop-In: /usr/lib/systemd/system/user-.slice.d
ββ10-defaults.conf, 11-limit-cpu.conf
/etc/systemd/system/user-1000.slice.d
ββlimit-cpu.conf
Active: active since Thu 2023-03-16 16:32:11 CST; 5min ago
Tasks: 5 (limit: 23634)
Memory: 16.6M
CGroup: /user.slice/user-1000.slice
ββsession-3.scope
β ββ2085 sshd: hpcadmin [priv]
β ββ2105 sshd: hpcadmin@pts/2
β ββ2106 -bash
ββuser@1000.service
ββinit.scope
ββ2095 /usr/lib/systemd/systemd --user
ββ2097 (sd-pam)
For more information, access the
man
page byman systemd.resource-control
CPUQuota
will limit the time usage on specific
processes, with unit %
. This configuration would not limit
the number of cores used by the user, only the time.MemoryHigh
sets the high limit of memory usage of
specific processes. The high limit means that, if inevitably, this limit
is surpassed, the workload manager will significantly slow the process
and actively takes away the processesβ memory. The unit of the
configuration can be K
, M
, G
, and
T
.MemoryMax
sets the absolute upper limit of memory usage
of specific processes. If the limit is surpasses, the workload manager
will terminate the processes. It is recommended to set
MemoryHigh
in combination of MemoryMax
. The
unit of the configuration can be K
, M
,
G
, and T
.