Snakemake resource profiles for the cluster¶
Snakemake ≥ 8.6
This document only presents how run Snakemake ≥ 8.6 on the cluster by using profiles. Setting profile for snakemake ≤ 7 is different and is not covered here whereas snakemake ≥8,<8.6 needs some investigation.
You can use the modules to enable snakemake on the cluster, but most of time, you will use conda or similar to target a version of snakemake.
When running a workflow, snakemake relies on an executor to run jobs on a cluster, a cloud, or your local computer. With genotoul-bioinfo cluster, we use the slurm executor
Slurm Excecutor¶
Install¶
No needs, if you use a snakemake≥8.6 module on the cluster, the plugin is already installed. Else you need snakemake-executor-plugin-slurm≥1.6.1.
# We search for snakemake module on the cluster
$ module search snakemake
-------------------------- /tools/modulefiles --------------------------
bioinfo/Snakemake/5.25.0: loads the bioinfo/Snakemake/5.25.0 environment
bioinfo/Snakemake/6.5.1: loads the bioinfo/Snakemake/6.5.1 environment
bioinfo/Snakemake/7.20.0: loads the bioinfo/Snakemake/7.20.0 environment
bioinfo/Snakemake/7.32.4: loads the bioinfo/Snakemake/7.32.4 environment
bioinfo/Snakemake/8.3.1: loads the bioinfo/Snakemake/8.3.1 environment
bioinfo/Snakemake/8.20.3: loads the bioinfo/Snakemake/8.20.3 environment
# We load last snakemake≥8.6
$ module load bioinfo/Snakemake/8.20.3
# We check which executor is available.
# We need snakemake-executor-plugin-slurm≥1.6.1 (for efficiency report)
$ pip list | grep snakemake
snakemake 8.20.3
snakemake-executor-plugin-slurm 2.6.1
snakemake-executor-plugin-slurm-jobstep 0.6.0
snakemake-interface-common 1.23.0
snakemake-interface-executor-plugins 9.4.0
snakemake-interface-report-plugins 1.1.0
snakemake-interface-storage-plugins 3.3.0
Configure¶
We propose a global profile named slurm to use snakemake with genotoul-bioinfo cluster.
You must install it in ~/.config/snakemake/ directory:
cd ~/.config/snakemake/
git clone https://forge.inrae.fr/genotoul-bioinfo/snakemake-profiles/slurm slurm
It sets default resource allocation for runing snakemake on cluster and is rather conservative. There is many way to set resource with snakemake. A good practice it to manage resource allocation in the Snakefile file itself or/and by using a workflow profile.
Alternatively, you can create your own global profile.
Run¶
First, we use the following Snakefile as a test:
| Snakefile | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 | |
This workflow is composed of one rule that will display which is the slurm resource used for computation.
Second, we create a script that will run snakemake with our profile.
Please replace the string <user@domain.tld> by your email in it.
| snakemake.sh | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Finally, we submit this script with sbatch command on unlimitq partition.
Please note that only snakemake will run on unlimitq.
Jobs managed by snakemake will use partitions configured in profile files and snakemake rules.
Look into file snakemake.out to get workflow progression.
When finished, check the result of the workflow in file stats.txt. It will display the slurm resource allocated to the snakemake job.
Check the profile used¶
The file snakemake.out provides global snakemake logs. If you get following message near its beginning, then everything is fine:
Using profile slurm for setting default command line arguments.
Note
Some workflows provide a default workflow profile (in profiles/default directory of the workflow). You will see then this message in place of previous one.
Using profiles slurm and workflow specific profile profiles/default for setting default command line arguments.
In this case, keep in mind that the default-resources block in ~/.config/snakemake/slurm/config.v8+.yaml file will be replaced (not updated with) by the one defined in profiles/default and can trigger some warnings or errors on the cluster.
Check the logs¶
With the slurm profile from Genotoul-bioinfo, logs are stored by default in the directory logs/slurm in workflow directory.
- Each subdirectory in
logs/slurmis associated to a rule from the Snakemake workflow. - Job efficiency is stored in file named like
efficiency_report_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.csvwhere thexxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxpart is the slurm job name that can be used withsacct(i.e.sacct --name=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
Create your own global profile¶
You can create an alternative global profile by creating file ~/.config/snakemake/<profile-name>/config.v8+.yaml:
- The directory
<profile-name>will set the profile name that will be used with the option--profile. - The file must be edited according to profile documentation and slurm executor documentation
For exemple, you can use our slurm profile as model and remove the slurm-efficiency-report parts to make a profile compatible with snakemake-executor-plugin-slurm<1.6.1.