Skip to content
hero

Job submission

Training

Cluster video clips

Which scheduler is used?

SLURM workload Manager : https://slurm.schedmd.com

Which commands can I use to submit my job?

BATCH

  • sbatch: submit a batch job to slurm (default workq partition).

  • sarray: submit a batch job-array to slurm.

INTERACTIVE

  • srun --pty bash : submit an interactive session with a compute node (default workq partition).

INTERACTIVE with X11 forwarding

For the first time, create your public key as below (onto genobioinfo server)

ssh-keygencat .ssh/id_rsa.pub >> .ssh/authorized_keys
  • srun --x11 --pty bash: submit an interactive session with X11 forwarding (default workq partition).

INTERACTIVE inside a batch job

  • srun --pty --jobid <jobid> bash: convenient to follow a batch job (connection on the node where the batch is running)

Basic parameters for srun command

The srun command can be tuned with following basics options:

  • -J job name -> for change the jobname
  • -p partition -> which partition(~ queue) to use
  • --time=HH:MM:SS -> max time of the job
  • --pty bash -> submit an interactive session with a compute node (default workq partition)
  • --pty bash -l -> submit an interactive session with login shell (load .bash_profile) on a compute node (default workq partition).

More options are available by typing srun --help on cluster.

IO redirection

Texts and errors displayed by softwares are redirected into a file named slurm-<jobid>.out. This filename can be changed with options:

  • -o <output_filename> or --output=<output_filename> : to specify the stdout redirection. If option -e (--error) is not specified both stdout ans stderr will be directed to the file name specified.

  • -e <error_filename> or --error=<error_filename>: if specified, stderr will be redirected to different location as stdout

Default job resource

Without any parameter, on any partition, each job is limited to 1 cpu, 2G ram (cpus-per-task=1, mem=2G)

How can I submit a simple job on the cluster?

1 - First write a script (ex: myscript.sh) with the command line as following:

myscript.sh
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash

#SBATCH -J test
#SBATCH -o output.out
#SBATCH -e error.out
#SBATCH --time=01:00:00
#SBATCH --mem=8G
#SBATCH --mail-type=BEGIN,END,FAIL (the email address is automatically LDAP account's one)

# Purge any previous modules
module purge

# Load the application
module load bioinfo/NCBI_Blast+/2.15.0

# My command lines I want to run on the cluster
blastp ...

2 - To submit the job, use the sbatch command line as following:

sbatch myscript.sh

How to book more memory than default?

By default, the memory reservation is 2GB.

To change memory reservation, add this option to the submission command (sbatch, srun, sarray):

  • --mem=XG where you replace X by the desired amount of memory.

How can I book more than 1 cpu?

With default parameters, each job is limited to 1 cpu. To book more, use the following options:

  • -c ncpus or --cpus-per-task=ncpus: Book n cpus on the same node (up to 64)

How can I change the time limit?

By default, the time limit is 2 hours.

You can increase or decrease it with the option --time=HH:MM:SS, or -t HH:MM:SS for short.

Which are the available queues/partitions?

Each job is submitted to a specific partition (the default one is the workq). Each partition has a different priority considering the maximum time of execution allowed.

The partition is configurable with -p option.

Queue Access Priority Default time Max time Max slots
workq everyone 100 2 hours 4 days (96h) 4992
unlimitq everyone 1 90 days 90 days 780
interq (OOD) everyone 12h 12h 32
gpuq on demand 4 days (96h) 4 days (96h) 64
wflowq specific software 90 days 90 days 4992

Submit an array of jobs ?

To submit an array of jobs, use sarray command (same options as sbatch options):

  1. Create a file with one command per line (prefixed by module load if needed)

    eg : file star_cmd.sh contains:

    star_cmd.sh
    module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech1.R1.fastq ech1.R2.fastq ...
    module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech2.R1.fastq ech2.R2.fastq ...
    module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech3.R1.fastq ech3.R2.fastq ...
    
  2. Launch sarray with sbatch options:

sarray -J jobName -o %j.out -e %j.err -t 01:00:00 --mem=8G \ --mail-type=BEGIN,END,FAIL star_cmd.sh

For step by step tutorial on how to create the command file, please take a look at FAQ Bioinfo tips.

How can I submit a MPI job ?

Here an example of a bash script :

mpi_job.sh
#!/bin/bash
#SBATCH -J mpi_job
#SBATCH --nodes=2
#SBATCH --tasks-per-node=6
#SBATCH --time=00:10:00
cd $SLURM_SUBMIT_DIR
module purge
module load mpi/openmpi/4.1.4
mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node ./hello_world

Run it with the usual sbatch mpi_job.sh command.

How can I monitor a running job ?

To do so, you can use the squeue command, followed by some usefull options:

# List only the specified user's jobs.squeue -u <username># Provide several informations on the specified jobsqueue -j <job-id>

Commands squeue --help and man squeue list more options.

You can also get more detail with scontrol command:

scontrol show job <job-id>

You can also have access to a graphical user interface which provides the same informations. This interface is accessible with the sview command.

How to use srun to check running jobs?

The srun command can be used to check in on a running job in the cluster.

  • srun --pty --jobid=<jobid> bash: starts a shell, where you can run any command, on the first allocated node in a specific job.

To check processor and memory usage quickly, you can run top directly:

srun --pty --jobid=<jobid> top -u <username>

How can I retrieve informations on a finished job ?

To do so, use the sacct command line as following:

sacct -j <job-id>

Commands sacct --help and man sacct list more options.

How can I kill my job ?

To do so, you can use the scancel command, followed by some usefull options:

  • Kill the specified job:
scancel <job-id>
  • Kill all job launched by the specified user:
scancel -u <username>

How to set my cpu and memory pre-allocation?

On a test job in COMPLETED state, check the result of the seff command:

seff <job-id>

and adjust time with -t , memory with --mem et cpu with --cpus-per-task options.

How can I get my jobs processed faster on the cluster?

The smaller a job, the faster it is processed. Set your different pre-allocations (time, memory and cpu) as closely as possible to your needs.

On a test job in COMPLETED state, check the result of the seff command:

seff <job-id>

and adjust time with -t , memory with --mem et cpu with --cpus-per-task options.

How to use GPU node ?

  • Request access to the GPU node here.

  • Example in a sbatch script:

my-gpu-script.sh
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#SBATCH --partition=gpuq

# If you wish to reserve 1 board A100 with 80GB RAM
#SBATCH  --gres=gpu:nvidia_a100:1

# or, if you wish to reserve 1 board L40S with 48GB RAM
#SBATCH  --gres=gpu:nvidia_l40s:1 

# or, if you have no card preference
#SBATCH  --gres=gpu:1 

You continue to reserve CPU threads (-c, --cpus_per_task) and RAM memory (--mem) in the same way as before.

  • Interactive mode:
srun --partition=gpuq --pty bash
  • Limitation rules

  • No more than 32 CPUs are reserved per job

  • No more than 3 GPUs are reserved per user (2 A100 max, 3 L40s max). These limits may change depending on card usage. See the result of the sq_gpu command to find out the limits in real time.

You must be very careful to disconnect from an interactive GPU session (launched with srun) after your use of GPU is finished.

Keeping the connection alive would lead to a high usage of computing resources that are shared and limited.

  • To see gpuq partition usage:
sq_gpu
  • To monitor GPU usage, several commands are available on the GPU node:
nvidia-smigpustatnvidia-htop.pynvitop

How to use OOD (RStudio, Jupyter, Linux Desktop)?

See our tutorials