How To Solve slurm Common Problem
- Using slurm command sinfo to check the node status: - If node status is - drain:You can use the following command to change the node status to normal- $ sudo scontrol update NodeName=<hostname> State=RESUME - If node status is - down:- Use the following command to see the node detail information, see the reason in the output of this command. 
 - $ sudo scontrol show nodes - Check whether all the nodes have the same - slurm.conffile under- /etc/slurm.
- Check whether service of slurmd, munge are active on all the nodes, and whether service of slurmctld is active on the management node. 
- Check whether all the nodes have the same date and whether ntpd service is active on all the nodes. 
 
- If you meet the following warning text when using srun/prun to run mpi program: - Failed to create a completion queue (CQ): ...... Error: Cannot allocate memory Please check whether soft memlock and hard memlock are unlimited in the file- /etc/security/limits.confon management node and compute nodes. If not, you should set them as unlimited and restart the nodes to take effect- * soft memlock unlimited * hard memlock unlimited