Gaëlle Lefort, Alyssa Imbert, Nathalie Vialaneix, Genotoul-Bioinfo

CC BY-NC-SA

How to use R on Genotoul-Bioinfo cluster¶

Objective

This tutorial aims at describing how to run R and basic R scripts on the Genotoul-Bioinfo cluster.

Correct social behaviour expected

DO NOT run treatments on frontal servers, you're going to be a nuisance to other users. Please, always use sbatch or srun. It includes positron editor.

Before contacting the support, READ THE FAQ and Tutorials.

prerequisite

You need to have an account. Ask for an account if needed.

Connect to cluster¶

For details, please consult related FAQ page

For short, you connect to the cluster using the ssh command. On Windows, you can also use Mobaxterm instead ssh command. The login address is genobioinfo.toulouse.inrae.fr.

# In this command, replace <username> by your real username on the clusterssh -X <username>@genobioinfo.toulouse.inrae.fr

You can copy files between the cluster and your computer using the scp command or a sftp client like Filezilla or Cyberduck.

Once you are connected, you have two solutions to run a script: running it in batch mode or starting an interactive session. The script must never be run on the first server you connect to. Also, be careful that the programs that you can use from the cluster are not available until you have loaded the corresponding module. How to manage modules is explained in next section.

Enable `R`¶

On Genotoul-Bioinfo cluster, software like R is not available by default. You must enable R by using module commands.

Look at 'How to use'

Please check the 'How to use' documentation to know you to run R. Depending of the version (most of the time, the recent ones), you need to load additional modules, in particular if you want to install some libraries.

For short, you will use following commands:

module avail: list all available modules
module search <TEXT>: find a module with keyword. For R the most efficient way to find version is to use module search statistics
module load <MODULE_NAME>: to load a module (for instance to load R-4.3.0, module load statistics/R/4.3.0). This command will be either used directly your terminal or included in your script files.
module purge: purge all previous loaded modules

Use `R` in interactive mode¶

To use R in a console mode, you must connect first to a compute node with srun command, then load a module enabling R and finally launch R.

Here the step by step commands to type:

# We connect to the cluster (if not already done, else skip it)ssh -X <username>@genobioinfo.toulouse.inrae.fr# We connect to a compute node with an interactive sessionsrun --x11 --pty bashsrun: job 17758928 queued and waiting for resources
srun: job 17758928 has been allocated resources
# We look at available R# Please look at 'How to use' for additional modulesmodule search statistics...
statistics/R/4.3.0: loads the statistics/R/4.3.0 environment
...
# We load R 4.3.0module load statistics/R/4.3.0# We run RR
R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

...

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
plot(1:10)

Compute nodes

Please do not forget to connect to a compute node when running R. It is the line starting with srun.

Limited resource

By default R will run with limited resource. Please take a look at the job submission page to learn how to raise those limits. The sbatch options section presented later in this document will also explain it a bit.

Graphics

If you want to display plots, you need option -X with the ssh command AND option --x11 with the srun command. You can omit them if you don't need to plot.

Run an `R` script in batch mode¶

To launch an R script on the cluster using SLURM:

First, write an R script. Here we create a script named HelloWorld.R:

HelloWorld.R
print("Hello world!")

Second, write a bash script in the same directory than HelloWorld.R:

myscript.sh
#!/bin/bash
#SBATCH -J launchRscript
#SBATCH -o output.out

# Purge all previously loaded modules
module purge

# Load the R module
module load statistics/R/4.3.0

# The command lines that I want to run on the cluster
Rscript HelloWorld.R

Finally, launch the script with the sbatch command:

sbatch myscript.sh

No screen?

Things that must have being displayed in terminal are written in output.out. We defined this file on line 3 in myscript.sh. If not defined, the file will be named something like slurm-xxxxxxx.out

If you are interested in rendering an R Markdown file (.Rmd), please look at Rmd files section.

`sbatch` options¶

Jobs can be launched with customized options (more memory, for instance). There are two ways to handle sbatch options:

[RECOMMENDED] at the beginning of the bash script (myscript.sh in the previous example) with lines of the form:
```
#SBATCH <OPTION> <VALUE>
```
in the sbatch command: sbatch <OPTION1> <VALUE1> <OPTION2> <VALUE2> [...] myscript.sh

Many options are available and same syntax apply to both way. To see all options use sbatch --help.

Useful options:

Syntax	Purpose
`-J`, `--job-name=jobname`	name of job
`-t`, `--time=HH:MM:SS`	time limit (default to 02:00:00)
`-c`, `--cpus-per-task=ncpus`	number of cpus required per task (default to 1)
`--mem-per-cpu=XG`	maximum amount of real memory per allocated cpu required by the job (default to 2G)
`--mem=XG`	or, global memory reservation independant to cpu required by the job. Mutually exclusive with `--mem-per-cpu` option
`-e`, `--error=err`	file for batch script's standard error
`-o`, `--output=out`	file for batch script's standard output
`--mail-type=BEGIN,END,FAIL`	send an email at the beginning, end or fail of the script. Email used to registered your account is used
`--mail-user=truc@bidule.fr`	set email that will be used with `--mail-type`

Manage job¶

After a job has been launched, you can monitor it with squeue -u <USERNAME> or squeue -j <JOB_ID>. You can also cancel it with scancel <JOB_ID>.

Install packages in your own environment¶

Once in an interactive R session, R packages are installed (in a personal library) using the standard install.packages command line.

srun --pty bashsrun: job 17758928 queued and waiting for resources
srun: job 17758928 has been allocated resourcesmodule load statistics/R/4.3.0R...
install.packages("ggplot2")

Your personal library is usually located at the root of your personal directory (i.e. ~/R) whose allocated space is very limited. A simple solution consists in:

# Create a directory named 'R' elsewhere (here in ~/work)mkdir ~/work/R# Create a symbolic link to this new directoryln -s ~/work/R ~/R

Some packages are already installed

A few R packages are already installed inside an R version. For example, the package dplyr is already installed inside the module statistics/R/4.3.0. You can check if a package is pre-installed using the command search_R_package like in this example.

search_R_package dplyr
Please, wait for output...

"dplyr 1.0.10" est installé dans statistics/R/3.4.3

...

"dplyr 1.1.4" est installé dans statistics/R/4.4.0

In case you would like to have an additional package pre-installed in a given R version, you could request it to support.

How to use R on Genotoul-Bioinfo cluster¶

Connect to cluster¶

Enable R¶

Use R in interactive mode¶

Run an R script in batch mode¶

sbatch options¶