Appearance
Remote Job Submission (SLURM / PBS)
When working on a cluster or HPC (High-Performance Computing) system, you usually can’t run simulations directly from the terminal like on your personal machine. Instead, you submit jobs to a scheduler — a system that manages who runs what, when, and for how long.
The two most common schedulers are SLURM and PBS. Think of them as “traffic controllers” for shared computing resources — they queue up everyone’s jobs, run them efficiently, and make sure nobody hogs the entire cluster.
Why Use a Job Scheduler?
On HPC clusters, hundreds of users may be running tasks at once. A scheduler ensures:
- Everyone gets a fair share of CPUs and memory.
- Jobs are executed only when the right resources are available.
- Long or large jobs don’t crash the shared login node.
Instead of running directly, you just submit your job script, and the cluster handles the rest — even if you log out.
How It Works (Simple Idea)
You write a small job script — a plain text file that tells the scheduler what resources you need and what command to run.
Example for SLURM (run_job.sh):
bash
#!/bin/bash
#SBATCH --job-name=fenics_sim
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=02:00:00
#SBATCH --mem=8GB
#SBATCH --output=output.log
module load fenics
python3 main_elasticity.pySubmit it using:
bash
sbatch run_job.shFor PBS, it looks almost the same:
bash
#!/bin/bash
#PBS -N fenics_sim
#PBS -l nodes=1:ppn=4,mem=8gb,walltime=02:00:00
#PBS -o output.log
#PBS -j oe
module load fenics
python3 main_elasticity.pySubmit it using:
bash
qsub run_job.shThat’s it — the scheduler will queue it, run it when resources are free, and save your output to the log file.
Monitoring Jobs
You can check your job’s status anytime:
For SLURM:
bash
squeue -u yourusernameFor PBS:
bash
qstat -u yourusernameOnce it’s done, you’ll find your simulation results and logs in the same directory where you submitted the script.
Notes
- Always submit jobs from the login node, not inside compute nodes — the scheduler will handle resource allocation.
- Make sure your simulation code and required modules (like FEniCS or Python) are properly loaded via
module loador your virtual environment. - If you need GPUs or more memory, request them explicitly using scheduler flags (e.g.,
#SBATCH --gres=gpu:1). - You can view completed job details using
sacct(SLURM) orqstat -x(PBS). - Combine this with
tmuxor email notifications so you don’t have to wait online.
Summary
- SLURM and PBS are schedulers that let you run jobs on powerful clusters safely and efficiently.
- You write a small script describing your resources and commands, submit it, and the cluster handles the rest.
- Perfect for long or resource-heavy simulations where you don’t want to sit and wait.
Once you get used to job submission, running big simulations becomes effortless — you write once, submit, and let the cluster do the heavy lifting while you focus on results.