Skip to content

Remote Job Submission (SLURM / PBS)

When working on a cluster or HPC (High-Performance Computing) system, you usually can’t run simulations directly from the terminal like on your personal machine. Instead, you submit jobs to a scheduler — a system that manages who runs what, when, and for how long.

The two most common schedulers are SLURM and PBS. Think of them as “traffic controllers” for shared computing resources — they queue up everyone’s jobs, run them efficiently, and make sure nobody hogs the entire cluster.

Why Use a Job Scheduler?

On HPC clusters, hundreds of users may be running tasks at once. A scheduler ensures:

  • Everyone gets a fair share of CPUs and memory.
  • Jobs are executed only when the right resources are available.
  • Long or large jobs don’t crash the shared login node.

Instead of running directly, you just submit your job script, and the cluster handles the rest — even if you log out.

How It Works (Simple Idea)

You write a small job script — a plain text file that tells the scheduler what resources you need and what command to run.

Example for SLURM (run_job.sh):

bash
#!/bin/bash
#SBATCH --job-name=fenics_sim
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=02:00:00
#SBATCH --mem=8GB
#SBATCH --output=output.log

module load fenics
python3 main_elasticity.py

Submit it using:

bash
sbatch run_job.sh

For PBS, it looks almost the same:

bash
#!/bin/bash
#PBS -N fenics_sim
#PBS -l nodes=1:ppn=4,mem=8gb,walltime=02:00:00
#PBS -o output.log
#PBS -j oe

module load fenics
python3 main_elasticity.py

Submit it using:

bash
qsub run_job.sh

That’s it — the scheduler will queue it, run it when resources are free, and save your output to the log file.

Monitoring Jobs

You can check your job’s status anytime:

For SLURM:

bash
squeue -u yourusername

For PBS:

bash
qstat -u yourusername

Once it’s done, you’ll find your simulation results and logs in the same directory where you submitted the script.

Notes

  • Always submit jobs from the login node, not inside compute nodes — the scheduler will handle resource allocation.
  • Make sure your simulation code and required modules (like FEniCS or Python) are properly loaded via module load or your virtual environment.
  • If you need GPUs or more memory, request them explicitly using scheduler flags (e.g., #SBATCH --gres=gpu:1).
  • You can view completed job details using sacct (SLURM) or qstat -x (PBS).
  • Combine this with tmux or email notifications so you don’t have to wait online.

Summary

  • SLURM and PBS are schedulers that let you run jobs on powerful clusters safely and efficiently.
  • You write a small script describing your resources and commands, submit it, and the cluster handles the rest.
  • Perfect for long or resource-heavy simulations where you don’t want to sit and wait.

Once you get used to job submission, running big simulations becomes effortless — you write once, submit, and let the cluster do the heavy lifting while you focus on results.