Skip to content
hero

Snakemake resource profiles for the cluster

Snakemake ≥ 8.6

This document only presents how run Snakemake ≥ 8.6 on the cluster by using profiles. Setting profile for snakemake ≤ 7 is different and is not covered here whereas snakemake ≥8,<8.6 needs some investigation.

You can use the modules to enable snakemake on the cluster, but most of time, you will use conda or similar to target a version of snakemake.

When running a workflow, snakemake relies on an executor to run jobs on a cluster, a cloud, or your local computer. With genotoul-bioinfo cluster, we use the slurm executor

Slurm Excecutor

Install

No needs, if you use a snakemake≥8.6 module on the cluster, the plugin is already installed. Else you need snakemake-executor-plugin-slurm≥1.6.1.

# We search for snakemake module on the cluster
$ module search snakemake
-------------------------- /tools/modulefiles --------------------------
bioinfo/Snakemake/5.25.0: loads the bioinfo/Snakemake/5.25.0 environment
bioinfo/Snakemake/6.5.1: loads the bioinfo/Snakemake/6.5.1 environment
bioinfo/Snakemake/7.20.0: loads the bioinfo/Snakemake/7.20.0 environment
bioinfo/Snakemake/7.32.4: loads the bioinfo/Snakemake/7.32.4 environment
bioinfo/Snakemake/8.3.1: loads the bioinfo/Snakemake/8.3.1 environment
bioinfo/Snakemake/8.20.3: loads the bioinfo/Snakemake/8.20.3 environment

# We load last snakemake≥8.6
$ module load bioinfo/Snakemake/8.20.3

# We check which executor is available.
# We need snakemake-executor-plugin-slurm≥1.6.1 (for efficiency report)
$ pip list | grep snakemake
snakemake                               8.20.3
snakemake-executor-plugin-slurm         2.6.1
snakemake-executor-plugin-slurm-jobstep 0.6.0
snakemake-interface-common              1.23.0
snakemake-interface-executor-plugins    9.4.0
snakemake-interface-report-plugins      1.1.0
snakemake-interface-storage-plugins     3.3.0

Configure

We propose a global profile named slurm to use snakemake with genotoul-bioinfo cluster.

You must install it in ~/.config/snakemake/ directory:

cd ~/.config/snakemake/
git clone https://forge.inrae.fr/genotoul-bioinfo/snakemake-profiles/slurm slurm

It sets default resource allocation for runing snakemake on cluster and is rather conservative. There is many way to set resource with snakemake. A good practice it to manage resource allocation in the Snakefile file itself or/and by using a workflow profile.

Alternatively, you can create your own global profile.

Run

First, we use the following Snakefile as a test:

Snakefile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
rule main:
    output:
        txt = "stats.txt"
    shell: """
        echo "Slurm resource allocated" > '{output.txt}'
        hostname >> '{output.txt}'
        echo "SLURM_JOB_ID=${{SLURM_JOB_ID:-undefined}}" >> '{output.txt}'
        echo "SLURM_JOB_NAME=${{SLURM_JOB_NAME:-undefined}}" >> '{output.txt}'
        echo "SLURM_JOB_ACCOUNT=${{SLURM_JOB_ACCOUNT:-undefined}}" >> '{output.txt}'
        echo "SLURM_CPUS_PER_TASK=${{SLURM_CPUS_PER_TASK:-undefined}}" >> '{output.txt}'
        echo "SLURM_MEM_PER_NODE=${{SLURM_MEM_PER_NODE:-undefined}}" >> '{output.txt}'
        """

This workflow is composed of one rule that will display which is the slurm resource used for computation.

Second, we create a script that will run snakemake with our profile. Please replace the string <user@domain.tld> by your email in it.

snakemake.sh
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --partition=unlimitq
#SBATCH --output=snakemake.out
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=<user@domain.tld>

# We load snakemake module
module load bioinfo/Snakemake/8.20.3

# Dispay snakemake version
snakemake --version

# Run snakemake workflow
snakemake --profile slurm

# If the date is displayed, the job reachs the end
echo "Job end: $(date '+%Y-%m-%d %R:%S.%N %Z')"

Finally, we submit this script with sbatch command on unlimitq partition. Please note that only snakemake will run on unlimitq. Jobs managed by snakemake will use partitions configured in profile files and snakemake rules.

# We run the job on unlimitqsbatch --partition unlimitq snakemake.sh

Look into file snakemake.out to get workflow progression.

When finished, check the result of the workflow in file stats.txt. It will display the slurm resource allocated to the snakemake job.

Check the profile used

The file snakemake.out provides global snakemake logs. If you get following message near its beginning, then everything is fine:

Using profile slurm for setting default command line arguments.

Note

Some workflows provide a default workflow profile (in profiles/default directory of the workflow). You will see then this message in place of previous one.

Using profiles slurm and workflow specific profile profiles/default for setting default command line arguments.

In this case, keep in mind that the default-resources block in ~/.config/snakemake/slurm/config.v8+.yaml file will be replaced (not updated with) by the one defined in profiles/default and can trigger some warnings or errors on the cluster.

Check the logs

With the slurm profile from Genotoul-bioinfo, logs are stored by default in the directory logs/slurm in workflow directory.

  • Each subdirectory in logs/slurm is associated to a rule from the Snakemake workflow.
  • Job efficiency is stored in file named like efficiency_report_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.csv where the xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx part is the slurm job name that can be used with sacct (i.e. sacct --name=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).

Create your own global profile

You can create an alternative global profile by creating file ~/.config/snakemake/<profile-name>/config.v8+.yaml:

For exemple, you can use our slurm profile as model and remove the slurm-efficiency-report parts to make a profile compatible with snakemake-executor-plugin-slurm<1.6.1.