Snakemake resource profiles for the cluster¶

Snakemake ≥ 8.6

This document only presents how run Snakemake ≥ 8.6 on the cluster by using profiles. Setting profile for snakemake ≤ 7 is different and is not covered here whereas snakemake ≥8,<8.6 needs some investigation.

You can use the modules to enable snakemake on the cluster, but most of time, you will use conda or similar to target a version of snakemake.

When running a workflow, snakemake relies on an executor to run jobs on a cluster, a cloud, or your local computer. With genotoul-bioinfo cluster, we use the slurm executor

Slurm Excecutor¶

Install¶

No needs, if you use a snakemake≥8.6 module on the cluster, the plugin is already installed. Else you need snakemake-executor-plugin-slurm≥1.6.1.

# We search for snakemake module on the clustermodule search snakemake-------------------------- /tools/modulefiles --------------------------
bioinfo/Snakemake/5.25.0: loads the bioinfo/Snakemake/5.25.0 environment
bioinfo/Snakemake/6.5.1: loads the bioinfo/Snakemake/6.5.1 environment
bioinfo/Snakemake/7.20.0: loads the bioinfo/Snakemake/7.20.0 environment
bioinfo/Snakemake/7.32.4: loads the bioinfo/Snakemake/7.32.4 environment
bioinfo/Snakemake/8.3.1: loads the bioinfo/Snakemake/8.3.1 environment
bioinfo/Snakemake/8.20.3: loads the bioinfo/Snakemake/8.20.3 environment
# We load last snakemake≥8.6module load bioinfo/Snakemake/8.20.3# We check which executor is available.# We need snakemake-executor-plugin-slurm≥1.6.1 (for efficiency report)pip list | grep snakemakesnakemake 8.20.3
snakemake-executor-plugin-slurm 2.6.1
snakemake-executor-plugin-slurm-jobstep 0.6.0
snakemake-interface-common 1.23.0
snakemake-interface-executor-plugins 9.4.0
snakemake-interface-report-plugins 1.1.0
snakemake-interface-storage-plugins 3.3.0

Configure¶

We propose a global profile named slurm to use snakemake with Genotoul-bioinfo cluster.

You must install it in ~/.config/snakemake/ directory:

cd ~/.config/snakemake/git clone https://forge.inrae.fr/genotoul-bioinfo/snakemake-profiles/slurm slurm

It sets default resource allocation for runing snakemake on cluster and is rather conservative. There is many ways to set resource with snakemake. A good practice it to manage resource allocation in the Snakefile file itself or/and by using a workflow profile.

Alternatively, you can create your own global profile.

Run¶

First, we use the following Snakefile as a test:

Snakefile
rule main:
    output:
        txt = "stats.txt"
    shell: """
        echo "Slurm resource allocated" > '{output.txt}'
        hostname >> '{output.txt}'
        echo "SLURM_JOB_ID=${{SLURM_JOB_ID:-undefined}}" >> '{output.txt}'
        echo "SLURM_JOB_NAME=${{SLURM_JOB_NAME:-undefined}}" >> '{output.txt}'
        echo "SLURM_JOB_ACCOUNT=${{SLURM_JOB_ACCOUNT:-undefined}}" >> '{output.txt}'
        echo "SLURM_CPUS_PER_TASK=${{SLURM_CPUS_PER_TASK:-undefined}}" >> '{output.txt}'
        echo "SLURM_MEM_PER_NODE=${{SLURM_MEM_PER_NODE:-undefined}}" >> '{output.txt}'
        """

This workflow is composed of one rule that will display which is the slurm resource used for computation.

Second, we create a script, in the same directory than the Snakefile, that will run snakemake with our profile. Please replace the string <user@domain.tld> by your email in it.

snakemake.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --partition=unlimitq
#SBATCH --output=snakemake.out
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=<user@domain.tld>

# A preambule to keep track of when the job has run
echo "Job start: $(date '+%Y-%m-%d %R:%S.%N %Z')"

# We load snakemake module
module load bioinfo/Snakemake/8.20.3

# Dispay snakemake version
snakemake --version

# Run snakemake workflow
snakemake --profile slurm

# If the date is displayed, the job reachs the end
echo "Job end: $(date '+%Y-%m-%d %R:%S.%N %Z')"

Finally, we submit this script with sbatch command on unlimitq partition. Please note that only snakemake will run on unlimitq. Jobs managed by snakemake will use partitions configured in profile files and snakemake rules.

# We run the job on unlimitqsbatch --partition unlimitq snakemake.sh

Look into file snakemake.out to get workflow progression.

When finished, check the result of the workflow in file stats.txt. It will display the slurm resource allocated to the snakemake job.

Check the profile used¶

The file snakemake.out provides global snakemake logs. If you get following message near its beginning, then everything is fine:

Using profile slurm for setting default command line arguments.

Note

Some workflows provide a default workflow profile (in profiles/default directory of the workflow). You will see then this message in place of previous one.

Using profiles slurm and workflow specific profile profiles/default for setting default command line arguments.

In this case, keep in mind that the default-resources block in ~/.config/snakemake/slurm/config.v8+.yaml file will be replaced (not updated with) by the one defined in profiles/default and can trigger some warnings or errors on the cluster.

Check the logs¶

With the slurm profile from Genotoul-bioinfo, logs are stored by default in the directory logs/slurm in workflow directory.

Each subdirectory in logs/slurm is associated to a rule from the Snakemake workflow.
Job efficiency is stored in file named like efficiency_report_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.csv where the xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx part is the slurm job name that can be used with sacct (i.e. sacct --name=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).

Create your own global profile¶

You can create an alternative global profile by creating file ~/.config/snakemake/<profile-name>/config.v8+.yaml:

The directory <profile-name> will set the profile name that will be used with the option --profile.
The file must be edited according to profile documentation and slurm executor documentation

For exemple, you can use our slurm profile as model and remove the slurm-efficiency-report parts to make a profile compatible with snakemake-executor-plugin-slurm<1.6.1.