[2025-01-18] For better promotion of the events, the categories in this system will be adjusted. For details, please refer to the announcement of this system. The link is https://indico-tdli.sjtu.edu.cn/news/1-warm-reminder-on-adjusting-indico-tdli-categories-indico

1 January 2050
Asia/Shanghai timezone

Slurm

The Astro cluster is an HPC system shared with many users. Thus, all executions need to submit the jobs. Here we use Slurm for job management. 

Brief of Slurm

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

Frequently used commands of Slurm

sacctDisplays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database.
sallocObtain a Slurm job allocation (a set of nodes), execute a command, and then release the allocation when the command is finished.
sbatchSubmit a batch script to Slurm.
scancelUsed to signal jobs or job steps that are under the control of Slurm.
scontrolView or modify Slurm configuration and state.
sinfoView information about Slurm nodes and partitions.
squeueView information about jobs located in the Slurm scheduling queue.
srunRun parallel jobs.

The command option --help also provides a brief summary of options. Note that the command options are all case sensitive. For more information, please refer to the official documentation of SLURM.

sinfo: view partition and node information

root@mgmt:/home/testy# sinfo 

PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST 
intelhigh     up 5-00:00:00    10   idle  n[30-39] 
intellow      up 3-00:00:00    10   idle  n[30-39] 
fat          up 3-00:00:00     1   idle  gpu1 
amdhigh       up 3-00:00:00    4   idle  n[48-51] 
amdlow        up 2-00:00:00    4   idle  n[48-51] 
serialhigh    up 20-00:00:0     1   idle  n52 
seriallow     up 10-00:00:0     1   idle  n52 
gpu2          up 10-00:00:0    1   idle  gpu2 
gwhigh        up 20-00:00:0    2   idle  n[46-47] 
gwlow         up 1-00:00:00    2   idle  n[46-47]  

tianyu        up 30-00:00:00   6   idle  n[40-45]  tianyu

FieldsDescriptions
PARTITIONName of a partition. Note that the suffix "*" identifies the default partition.
AVAILPartition state. Can be either up, down, drain, or inact (for INACTIVE).
TIMELIMITMaximum time limit for any user job in days-hours:minutes:seconds. infinite is used to identify partitions without a job time limit.
NODESCount of nodes with this particular configuration.
STATEState of the nodes. Possible states include: allocated, completing, down, drained, draining, fail, failing, future, idle, maint, mixed, perfctrs, planned, power_down, power_up, reserved, and unknown. Their abbreviated forms are: alloc, comp, down, drain, drng, fail, failg, futr, idle, maint, mix, npc, plnd, pow_dn, pow_up, resv, and unk respectively. NOTE: The suffix "*" identifies nodes that are presently not responding.
NODELISTNames of nodes associated with this particular configuration.

Note that the available partition depends on the user group. Some of the partitions are available only specific group.

squeue: reports the state of jobs or job steps

root@mgmt:/home/testy# squeue

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

   64    normal interact     root  R       0:02      1 n30

Fields

Descriptions

JOBIDA unique value for each element of job arrays and each component of heterogeneous jobs.
PARTITIONPartition of the job or job step.
NAMEThe name of the job. The default is the name of the submitted script.
USERThe username that submitted the job.
STJob state. PD: Pending, R: Running, S: Suspended, CG: Completing
TIMEThe time that the job has been running.
NODESThe number of nodes occupied by the job.
NODLIST(REASON)A list of nodes occupied by the job. If it is a job in the queued state, the reason for queuing is given.

scancel: cancel a pending or running job or job step

Cancel the job by entering the scancel JOBID command.

Submit jobs with slurm

The command sbatch submits a batch script to Slurm. 

Sample script:

### This is a bash script

#!/bin/bash

 

### The job will be submitted to the partition named "normal"

#SBATCH --partition=normal

 

### Sets the name of the job

#SBATCH --job-name=JOBNAME

 

### Specifies that the job requires two nodes

#SBATCH --nodes=2

 

### The number of processes running per node is 40

#SBATCH --ntasks-per-node=40

 

### The maximum running time of a job, after which the job resources are reclaimed by SLURM

#SBATCH --time=2:00:00

 

### The execution command of the program

mpirun hostname

 

After a job submitted through sbatch, the system will return the ID of the job. The squeue command can report the status of the job. When the job is completed, the output of the job will be put into the file slurm-JOBID.out by default.

If the job needs GPU, the options --gres=gpu:<number of card> needs to be added into the script. For example, --gres=gpu:1 means that the job needs 1 GPU card.

Sample script:

### This is a bash script

#!/bin/bash

 

### Sets the name of the job

#SBATCH --job-name=gpu-example

 

### The job will be submitted to the partition named "gpu"

#SBATCH --partition=gpu

 

### Specifies that the job requires one node

#SBATCH --nodes=1

 

### 16 CPU cores are needed to run the job

#SBATCH --ntasks=16

 

### GPU card is needed to run the job

#SBATCH --gres=gpu:1

 

### The execution command of the program

python test.py

 

Common options for sbatch

--partition=<name>  
Request a specific partition for the resource allocation.  
--job-­name=<name>  
Specify a name for the job allocation.  
--nodes=<n>  
Request that a minimum of minnodes nodes be allocated to this job.  
--ntasks-per-node=<ntasks>  
Request that ntasks be invoked on each node.  
--ntasks=<n>  
Set the maximum number of tasks.  
--cpus-per-task=<ncpus>  
Set the number of CPU cores required for each task. If not specified, each task is assigned one CPU core by default. Generally, it is required to run multi-threaded programs such as OpenMP, but not for ordinary MPI programs.