Skip to content
hero
Gaëlle Lefort, Alyssa Imbert, Julien Henry, Philippe Bordron, Nathalie Vialaneix

How to use R on the Bioinformatics cluster

Correct social behaviour expected

DO NOT run treatments on frontal servers, you're going to be a nuisance to other users. Please, always use sbatch or srun.

Before contacting the support, READ THE FAQ

Objective

This tutorial aims at describing how to run R scripts and compile RMarkdown files on the Toulouse Bioinformatics cluster.

To do so, you need to have an account. Ask for an account if needed. You can then connect to the cluster using the ssh command on Linux and Mac OS X and using Mobaxterm on Windows. Similarly, you can copy files between the cluster and your computer using the scp command on Linux and Mac OS X and using OpenSSH on Windows. The login address is genobioinfo.toulouse.inrae.fr.

Once you are connected, you have two solutions to run a script: running it in batch mode or starting an interactive session. The script must never be run on the first server you connect to. Also, be careful that the programs that you can use from the cluster are not available until you have loaded the corresponding module. How to manage modules is explained in next section.

Use of modules

All programs are made available by loading the corresponding modules. These are the main useful commands to work with modules:

  • module avail: list all available modules
  • search_module <TEXT>: find a module with keyword
  • module load <MODULE_NAME>: to load a module (for instance to load R, module load statistics/R/4.3.0). This command is either used directly (in interactive mode) or included in the file that is used to run your R script in batch mode (see below)
  • module purge: purge all previous loaded modules

Run an R script in batch mode

To launch an R script on the slurm cluster:

First, write an R script:

HelloWorld.R
print("Hello world!")

Second, write a bash script:

myscript.sh
#!/bin/bash
#SBATCH -J launchRscript
#SBATCH -o output.out

# Purge all previously loaded modules
module purge

# Load the R module
module load statistics/R/4.3.0

# The command lines that I want to run on the cluster
Rscript HelloWorld.R

Finally, launch the script with the sbatch command:

sbatch myscript.sh

The scripts myscript.sh and HelloWorld.R are supposed to be located in the same directory from which the sbatch command is launched. For Rmd files, be careful that you cannot compile a document if the .Rmd file is not in a writable directory.

sbatch options

Jobs can be launched with customized options (more memory, for instance). There are two ways to handle sbatch options:

  • [RECOMMENDED] at the beginning of the bash script with lines of the form:
    #SBATCH <OPTION> <VALUE>
    
  • in the sbatch command: sbatch <OPTION1> <VALUE1> <OPTION2> <VALUE2> [...] myscript.sh

Many options are available. To see all options use sbatch --help. Useful options:

  • -J, --job-name=jobname: name of job
  • -e, --error=err: file for batch script's standard error
  • -o, --output=out: file for batch script's standard output
  • --mail-type=BEGIN,END,FAIL: send an email at the beginning, end or fail of the script (default email is your user email and can be changed with --mail-user=truc@bidule.fr, to use with care)
  • -t, --time=HH:MM:SS: time limit (default to 04:00:00)
  • --mem=XG: to change memory reservation (default to 4G)
  • -c, --cpus-per-task=ncpus: number of cpus required per task (default to 1)
  • --mem-per-cpu=XG: maximum amount of real memory per allocated cpu required by the job

Job management

After a job has been launched, you can monitor it with squeue -u <USERNAME> or squeue -j <JOB_ID> and also cancel it with scancel <JOB_ID>.

Use R in interactive mode

To use R in a console mode, use srun --pty bash to be connected to a node. Then, module load statistics/R/4.3.0 (for the latest R version) and R to launch R.

srun --pty bashsrun: job 17758928 queued and waiting for resources
srun: job 17758928 has been allocated resources
module load statistics/R/4.3.0R
R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Note

srun can be run with the same options than sbatch (cpu and memory reservations).

X11 sessions

X11 sessions are useful to directly display plots in an interactive session. Prior their use, and if not exists, generate a ssh key on the cluster with ssh-keygen and add it in the authorized_keys file:

ssh-keygen...cat .ssh/id_rsa.pub >> .ssh/authorized keys

The interactive session is then launched by:

  1. Logging on the cluster with ssh -X <USERNAME>@genobioinfo.toulouse.inrae.fr
  2. Running an interactive session with srun --x11 --pty bash
srun --x11 --pty bashsrun: job 17758928 queued and waiting for resources
srun: job 17758928 has been allocated resources
module load statistics/R/4.3.0R
R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
plot(1:10)

R in a parallel environment

To use R with a parallel environment, the -c (or --cpus-per-task) option for the sbatch and srun is needed. In the R script, the number of cores must be set to the SAME value.

Several packages, like doParallel, BiocParallel, or future, exist to use parallel calculation with R . The following examples use doParallel and BiocParallel for 2 parallel jobs.

First, write a R script

  • With doParallel package:
    TestParallel.R
    library(doParallel)
    # specify the number of cores with makeCluster
    cl <- makeCluster(2)
    registerDoParallel(cl)
    
    foreach(i=1:3) %dopar% sqrt(i)
    
  • or, with BiocParallel package:
    TestParallel.R
    library(BiocParallel)
    
    # specify the number of cores with workers = 2
    bplapply(1:10, print, BPPARAM = MulticoreParam(workers = 2))
    

Second, write a bash script:

myscript.sh
#! /bin/bash
#SBATCH -J lauchRscript
#SBATCH -o output.out
#SBATCH -c 2

#Purge any previous modules
module purge

#Load the application
module load statistics/R/4.3.0

# My command lines I want to run on the cluster
Rscript TestParallel.R

Finally, launch the script with the sbatch command:

sbatch myscript.sh

Arguments in a script

External arguments can be passed to an R script. The basic method is described below but the packages argparser or optparse provide ways to handle external arguments à la Python.

First, write an R script:

HelloWorld.R
args <- commandArgs(trailingOnly=TRUE)

print(args[1])

Second, write a bash script:

myscript.sh
#! /bin/bash
#SBATCH -J lauchRscript
#SBATCH -o output.out

#Purge any previous modules
module purge

#Load the application
module load statistics/R/4.3.0

# My command lines I want to run on the cluster
Rscript --vanilla HelloWorld.R "Hi!"

Finally, launch the script with the sbatch command:

sbatch myscript.sh

Install packages in your own environment

Once in an interactive R session, R packages are installed (in a personal library) using the standard install.packages command line.

srun --pty bashsrun: job 17758928 queued and waiting for resources
srun: job 17758928 has been allocated resources
module load statistics/R/4.3.0R
R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
install.packages("ggplot2")

Your personal library is usually located at the root of your personal directory (i.e. ~/R) whose allocated space is very limited. A simple solution consists in:

# Create a directory named 'R' elsewhere (here in ~/work)mkdir ~/work/R# Create a symbolic link to this new directoryln -sr ~/work/R ~/R

Some packages are already installed

A few R packages are already installed inside an R version. For example, the package dplyr is already installed inside the module statistics/R/4.3.0. You can check if a package is pre-installed using the command search_R_package like in this example.

search_R_package dplyr
Please, wait for output...

"dplyr 1.0.10" est installé dans statistics/R/3.4.3

...

"dplyr 1.1.4" est installé dans statistics/R/4.4.0

In case you would like to have an additional package pre-installed in a given R version, you could request it to support.

Create and compile .Rmd (RMarkdown) files on the cluster (batch mode)

To compile a .Rmd file, two packages are needed: rmarkdown and knitr. You also need to load the module tools/Pandoc/3.1.2. As for an R script, you can pass external arguments to a .Rmd document.

First, write a .Rmd script called MyDocument.Rmd with parameters in the header:

MyDocument.Rmd
---
title: My Document
output: html_document
params:
    text: "Hi!"
---

What is your text?
```{r}
print(params$text)
```

Second, write a R script to pass parameters:

TestRmd.R
rmarkdown::render("MyDocument.Rmd", 
                  params = list(text = "Hola!"))

Third, write a bash script:

myscript.sh
#SBATCH -J lauchRscript
#SBATCH -o output.out

module purge
module load statistics/R/4.3.0
module load tools/Pandoc/3.1.2

Rscript --vanilla TestRmd.R

Finally, launch the script with the sbatch command:

sbatch myscript.sh

Access R through conda

On the cluster, conda is a way to get additional versions of R. It is available through a module named devel/Miniforge/Miniforge3. The following commands are an example to how create a conda environment with R 4.2.0.

module load devel/Miniforge/Miniforge3# We search for which R versions are availableconda search -c conda-forge r-base# We create a conda env at ~/work/envs/r-4.2# You can choose another path after '-p'conda create -c conda-forge -p ~/work/envs/r-4.2 r-base=4.2.0# We make R available by loading the env# /!\ The command differs from the usual 'conda activate' command /!\source activate ~/work/envs/r-4.2# You can then launch RR

When finished, you can unload the env this way

conda deactivate