Slurm on a single node computer
Install
This instruction targets the installation of SLURM on a single node compute cluster running ubuntu 20.04.
32 Processors (threads)
sudo apt update -y
sudo apt install slurmd slurmctld -y
sudo chmod 777 /etc/slurm-llnl
# make sure to use the HOSTNAME
sudo cat << EOF > /etc/slurm-llnl/slurm.conf
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=localcluster
SlurmctldHost=$HOSTNAME
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#
# COMPUTE NODES # THis machine has 128GB main memory
NodeName=$HOSTNAME CPUs=32 RealMemory==128762 State=UNKNOWN
PartitionName=local Nodes=ALL Default=YES MaxTime=INFINITE State=UP
EOF
sudo chmod 755 /etc/slurm-llnl/
Start
sudo systemctl start slurmctld
sudo systemctl start slurmd
sudo scontrol update nodename=localhost state=idle
Stop
sudo systemctl stop slurmd
sudo systemctl stop slurmctld
Info
sinfo
Job
save into gregor.slurm
#!/bin/bash
#SBATCH --job-name=gregors_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=laszewski@gmail.com # Where to send mail
#SBATCH --ntasks=1 # Run on a single CPU
#### XBATCH --mem=1gb # Job memory request
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=sgregors_test_%j.log # Standard output and error log
pwd; hostname; date
echo "Gregors Test"
date
sleep(30)
date
Run with
sbatch gregor.slurm
watch -n 1 squeue
BUG
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 LocalQ gregors_ green PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
sbatch slurm management commands for localhost
start slurm deamons
cms ee slurm start
stop surm deamons
cms ee slurm stop
BUG:
srun gregor.slurm
srun: Required node not available (down, drained or reserved)
srun: job 7 queued and waiting for resources
sudo scontrol update nodename=localhost state=POWER_UP
Valid states are: NoResp DRAIN FAIL FUTURE RESUME POWER_DOWN POWER_UP UNDRAIN