How to use R on Genotoul-Bioinfo cluster¶
Objective
This tutorial aims at describing how to run R and basic R scripts on the Genotoul-Bioinfo cluster.
Correct social behaviour expected
DO NOT run treatments on frontal servers, you're going to be a nuisance to other users. Please, always use sbatch or srun.
It includes positron editor.
Before contacting the support, READ THE FAQ and Tutorials.
prerequisite
You need to have an account. Ask for an account if needed.
Connect to cluster¶
For details, please consult related FAQ page
For short, you connect to the cluster using the ssh command. On Windows, you can also use Mobaxterm instead ssh command.
The login address is genobioinfo.toulouse.inrae.fr.
You can copy files between the cluster and your computer using the scp command or a sftp client like Filezilla or Cyberduck.
Once you are connected, you have two solutions to run a script: running it in batch mode or starting an interactive session. The script must never be run on the first server you connect to. Also, be careful that the programs that you can use from the cluster are not available until you have loaded the corresponding module. How to manage modules is explained in next section.
Enable R¶
On Genotoul-Bioinfo cluster, software like R is not available by default. You must enable R by using module commands.
Look at 'How to use'
Please check the 'How to use' documentation to know you to run R. Depending of the version (most of the time, the recent ones), you need to load additional modules, in particular if you want to install some libraries.
For short, you will use following commands:
module avail: list all available modulesmodule search <TEXT>: find a module with keyword. ForRthe most efficient way to find version is to usemodule search statisticsmodule load <MODULE_NAME>: to load a module (for instance to load R-4.3.0,module load statistics/R/4.3.0). This command will be either used directly your terminal or included in your script files.module purge: purge all previous loaded modules
Use R in interactive mode¶
To use R in a console mode, you must connect first to a compute node with srun command, then load a module enabling R and finally launch R.
Here the step by step commands to type:
srun: job 17758928 has been allocated resources
# We look at available R# Please look at 'How to use' for additional modulesmodule search statistics...
statistics/R/4.3.0: loads the statistics/R/4.3.0 environment
...
# We load R 4.3.0module load statistics/R/4.3.0# We run RR
R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
...
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
plot(1:10)
Compute nodes
Please do not forget to connect to a compute node when running R. It is the line starting with srun.
Limited resource
By default R will run with limited resource. Please take a look at the job submission page to learn how to raise those limits. The sbatch options section presented later in this document will also explain it a bit.
Graphics
If you want to display plots, you need option -X with the ssh command AND option --x11 with the srun command. You can omit them if you don't need to plot.
Run an R script in batch mode¶
To launch an R script on the cluster using SLURM:
First, write an R script. Here we create a script named HelloWorld.R:
| HelloWorld.R | |
|---|---|
1 | |
Second, write a bash script in the same directory than HelloWorld.R:
| myscript.sh | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 | |
Finally, launch the script with the sbatch command:
No screen?
Things that must have being displayed in terminal are written in output.out. We defined this file on line 3 in myscript.sh. If not defined, the file will be named something like slurm-xxxxxxx.out
If you are interested in rendering an R Markdown file (.Rmd), please look at Rmd files section.
sbatch options¶
Jobs can be launched with customized options (more memory, for instance). There are two ways to handle sbatch options:
- [RECOMMENDED] at the beginning of the bash script (
myscript.shin the previous example) with lines of the form:#SBATCH <OPTION> <VALUE> - in the
sbatchcommand:sbatch <OPTION1> <VALUE1> <OPTION2> <VALUE2> [...] myscript.sh
Many options are available and same syntax apply to both way. To see all options use sbatch --help.
Useful options:
| Syntax | Purpose |
|---|---|
-J, --job-name=jobname |
name of job |
-t, --time=HH:MM:SS |
time limit (default to 02:00:00) |
-c, --cpus-per-task=ncpus |
number of cpus required per task (default to 1) |
--mem-per-cpu=XG |
maximum amount of real memory per allocated cpu required by the job (default to 2G) |
--mem=XG |
or, global memory reservation independant to cpu required by the job. Mutually exclusive with --mem-per-cpu option |
-e, --error=err |
file for batch script's standard error |
-o, --output=out |
file for batch script's standard output |
--mail-type=BEGIN,END,FAIL |
send an email at the beginning, end or fail of the script. Email used to registered your account is used |
--mail-user=truc@bidule.fr |
set email that will be used with --mail-type |
Manage job¶
After a job has been launched, you can monitor it with squeue -u <USERNAME> or squeue -j <JOB_ID>. You can also cancel it with scancel <JOB_ID>.
Install packages in your own environment¶
Once in an interactive R session, R packages are installed (in a personal library) using the standard install.packages command line.
srun: job 17758928 has been allocated resourcesmodule load statistics/R/4.3.0R...
install.packages("ggplot2")
Your personal library is usually located at the root of your personal directory (i.e. ~/R) whose allocated space is very limited. A simple solution consists in:
Some packages are already installed
A few R packages are already installed inside an R version. For example, the package dplyr is already installed inside the module statistics/R/4.3.0. You can check if a package is pre-installed using the command search_R_package like in this example.
Please, wait for output...
"dplyr 1.0.10" est installé dans statistics/R/3.4.3
...
"dplyr 1.1.4" est installé dans statistics/R/4.4.0
In case you would like to have an additional package pre-installed in a given R version, you could request it to support.