Remote Job Submission (SLURM / PBS)

When working on a cluster or HPC (High-Performance Computing) system, you usually can’t run simulations directly from the terminal like on your personal machine. Instead, you submit jobs to a scheduler — a system that manages who runs what, when, and for how long.

The two most common schedulers are SLURM and PBS. Think of them as “traffic controllers” for shared computing resources — they queue up everyone’s jobs, run them efficiently, and make sure nobody hogs the entire cluster.

Why Use a Job Scheduler?

On HPC clusters, hundreds of users may be running tasks at once. A scheduler ensures:

Everyone gets a fair share of CPUs and memory.
Jobs are executed only when the right resources are available.
Long or large jobs don’t crash the shared login node.

Instead of running directly, you just submit your job script, and the cluster handles the rest — even if you log out.

How It Works (Simple Idea)

You write a small job script — a plain text file that tells the scheduler what resources you need and what command to run.

Example for SLURM (run_job.sh):

bash

#!/bin/bash
#SBATCH --job-name=fenics_sim
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=02:00:00
#SBATCH --mem=8GB
#SBATCH --output=output.log

module load fenics
python3 main_elasticity.py

Submit it using:

bash

sbatch run_job.sh

For PBS, it looks almost the same:

bash

#!/bin/bash
#PBS -N fenics_sim
#PBS -l nodes=1:ppn=4,mem=8gb,walltime=02:00:00
#PBS -o output.log
#PBS -j oe

module load fenics
python3 main_elasticity.py

Submit it using:

bash

qsub run_job.sh

That’s it — the scheduler will queue it, run it when resources are free, and save your output to the log file.

Monitoring Jobs

You can check your job’s status anytime:

For SLURM:

bash

squeue -u yourusername

For PBS:

bash

qstat -u yourusername

Once it’s done, you’ll find your simulation results and logs in the same directory where you submitted the script.

Notes

Always submit jobs from the login node, not inside compute nodes — the scheduler will handle resource allocation.
Make sure your simulation code and required modules (like FEniCS or Python) are properly loaded via module load or your virtual environment.
If you need GPUs or more memory, request them explicitly using scheduler flags (e.g., #SBATCH --gres=gpu:1).
You can view completed job details using sacct (SLURM) or qstat -x (PBS).
Combine this with tmux or email notifications so you don’t have to wait online.

Summary

SLURM and PBS are schedulers that let you run jobs on powerful clusters safely and efficiently.
You write a small script describing your resources and commands, submit it, and the cluster handles the rest.
Perfect for long or resource-heavy simulations where you don’t want to sit and wait.

Once you get used to job submission, running big simulations becomes effortless — you write once, submit, and let the cluster do the heavy lifting while you focus on results.

Remote Job Submission (SLURM / PBS) ​

Why Use a Job Scheduler? ​

How It Works (Simple Idea) ​

Monitoring Jobs ​

Notes ​

Summary ​

Remote Job Submission (SLURM / PBS)

Why Use a Job Scheduler?

How It Works (Simple Idea)

Monitoring Jobs

Notes

Summary