Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
habrok:examples:r [2021/03/02 12:44] – external edit 127.0.0.1habrok:examples:r [2025/06/11 12:15] (current) – [Multiple CPUs] lint pedro
Line 7: Line 7:
 In order to run a simple R script on one core, two files should be constructed. The first file is the R script (with the .R extension), here it is called ''%%R_example1.R%%'' and it holds the following textual contents: In order to run a simple R script on one core, two files should be constructed. The first file is the R script (with the .R extension), here it is called ''%%R_example1.R%%'' and it holds the following textual contents:
  
-<code R_example1.R>+<code rsplus R_example1.R>
 # Simple t-test between two equally large groups. # Simple t-test between two equally large groups.
 # Let us generate some data. # Let us generate some data.
-apples <- rnorm(20,mean=1.5,sd=.1) +apples <- rnorm(20, mean = 1.5, sd = .1) 
-pears <- rnorm(20,mean=1.6,sd=.1) +pears <- rnorm(20, mean = 1.6, sd = .1) 
-print(t.test(pears,apples))+print(t.test(pears, apples))
 </code> </code>
  
Line 26: Line 26:
 #SBATCH --mem=1000 #SBATCH --mem=1000
  
-ml R/3.4.4-foss-2018a-X11-20180131+module purge 
 +module load R/4.2.1-foss-2022a
 Rscript R_example1.R Rscript R_example1.R
 </code> </code>
Line 45: Line 46:
 The final output should look like: The final output should look like:
  
-<code r>+<code rsplus>
     Welch Two Sample t-test     Welch Two Sample t-test
  
Line 72: Line 73:
 #SBATCH --mem=10GB #SBATCH --mem=10GB
  
-module load R/3.4.4-foss-2018a-X11-20180131+module purge 
 +module load R/4.2.1-foss-2022a
  
 Rscript parallel.R Rscript parallel.R
 </code> </code>
  
-The file containing the R code is named ''%%parallel.R%%''. Note that the package ''%%parallel%%'' and ''%%snow%%'' are used but others might be available as well. If necessary additional packages have to be installed, see [[peregrine:software_environment:installation_of_extra_applications_or_libraries#r|this page]].+The file containing the R code is named ''%%parallel.R%%''. Note that the package ''%%parallel%%'' and ''%%snow%%'' are used but others might be available as well. If necessary additional packages have to be installed, see [[habrok:software_environment:installation_of_extra_applications_or_libraries#r|this page]].
  
-<code parallel.R>+<code rsplus parallel.R>
 library("snow") library("snow")
 library("parallel") library("parallel")
  
 cpu <- Sys.getenv("SLURM_CPUS_ON_NODE", 1) # Number of cores requested (use 1 core if running outside a job). cpu <- Sys.getenv("SLURM_CPUS_ON_NODE", 1) # Number of cores requested (use 1 core if running outside a job).
-hosts <- rep("localhost",cpu)+hosts <- rep("localhost", cpu)
 cl <- makeCluster(hosts, type = "SOCK") cl <- makeCluster(hosts, type = "SOCK")
 # Create random matrices. # Create random matrices.
 n <- 5000 n <- 5000
-A <- matrix(rnorm(n^2),n) +A <- matrix(rnorm(n^2), n) 
-B <- matrix(rnorm(n^2),n)+B <- matrix(rnorm(n^2), n)
  
 # Single core time of matrix multiplication of matrices # Single core time of matrix multiplication of matrices
Line 124: Line 126:
 In order to use multiple nodes with R, you can make use of an MPI cluster in your R code. Due to the way our MPI libraries are installed, it's not possible to use the makeCluster or makeMPIcluster or makeCluster, though. The correct way to do it is by making use of the getMPIcluster function, as shown in the following example: In order to use multiple nodes with R, you can make use of an MPI cluster in your R code. Due to the way our MPI libraries are installed, it's not possible to use the makeCluster or makeMPIcluster or makeCluster, though. The correct way to do it is by making use of the getMPIcluster function, as shown in the following example:
  
-<code r>+<code rsplus>
 library("snow") library("snow")
 library("parallel") library("parallel")
Line 140: Line 142:
 #SBATCH --mem=10GB #SBATCH --mem=10GB
  
-module load R/3.4.4-foss-2018a-X11-20180131+module purge 
 +module load R/4.2.1-foss-2022a
  
 srun ${EBROOTR}/lib64/R/library/snow/RMPISNOW < parallel.R srun ${EBROOTR}/lib64/R/library/snow/RMPISNOW < parallel.R
Line 147: Line 150:
 ==== GPU ==== ==== GPU ====
  
-In order to let R use the GPUs of the Peregrine cluster, we first have install the R library ''gpuR''. Unfortunately, this library has been removed from CRAN, and we will need to install it from GitHub. To do this, we first start an interactive job on one of the GPU nodes with an NVidia V100 GPU. You can similarly install it from a node with an NVidia K40 GPU, but we have fewer of those. This is how you start an interactive job from the command line on the login node:+In order to let R use the GPUs of the Hábrók cluster, we first have install the R library ''gpuR''. Unfortunately, this library has been removed from CRAN, and we will need to install it from GitHub. To do this, we first start an interactive job on one of the GPU nodes with an NVidia V100 GPU. You can similarly install it from a node with an NVidia K40 GPU, but we have fewer of those. This is how you start an interactive job from the command line on the login node:
 <code bash> <code bash>
 srun --time=00:30:00 --partition=gpushort --gres=gpu:v100:1 --cpus-per-task=12 --pty bash -i srun --time=00:30:00 --partition=gpushort --gres=gpu:v100:1 --cpus-per-task=12 --pty bash -i
Line 165: Line 168:
 Within this session, installing ''gpuR'' is rather straightforward: Within this session, installing ''gpuR'' is rather straightforward:
  
-<code R>+<code rsplus>
 library(devtools) library(devtools)
 install_github("cdeterman/gpuR") install_github("cdeterman/gpuR")
Line 187: Line 190:
 In this example we use a R script that does a simple GPU matrix multiplication and a normal CPU matrix multiplication. In this example we use a R script that does a simple GPU matrix multiplication and a normal CPU matrix multiplication.
  
-<code gpu.R>+<code rsplus gpu.R>
 library("gpuR") library("gpuR")
 ORDER = 10000 ORDER = 10000
Line 205: Line 208:
 Now we can submit the job using ''%%sbatch gpuExampleR.sh%%''.\\ Now we can submit the job using ''%%sbatch gpuExampleR.sh%%''.\\
 In this example the matrix multiplication is slower for the GPU when small matrices are used (100x100); however, as the matrix gets larger, the GPU speed will scale better as can be seen from the output. In this example the matrix multiplication is slower for the GPU when small matrices are used (100x100); however, as the matrix gets larger, the GPU speed will scale better as can be seen from the output.
 +
 +===== Installing additional R libraries =====
 +
 +R provides the ''install.packages'' command to install additional libraries/packages. Before you start submitting jobs, you can use it to install the packages that you need in the following way:
 +
 +  * log in and load the R module that you would like to use, e.g.: ''module load R/4.2.1-foss-2022a''
 +  * launch R by running the command: ''R''
 +  * run the appropriate ''install.packages'' command for all packages that you want to install. The first time you will do this, R will ask you permission to create a personal library installation directory in your home directory.
 +  * quit using ''q()''
 +
 +Now the packages will be available for any job that you run. Note that you do have to do this again once you switch to a completely different R version.
 +
 +==== Login node limits ====
 +
 +Note that if you are trying to install something on the login node, you may run into certain limits that prevent you from using too much memory on the login node. See [[:habrok:additional_information:faq#i_am_running_something_on_the_login_node_and_the_process_gets_killed_why_does_this_happen|this]] question in the FAQ for more information.
  
 ===== Useful packages ===== ===== Useful packages =====
  
-One of our Peregrine users has developed an R package to send function calls as jobs on Slurm via SSH. You can find the documentation and installation instructions on [[https://github.com/mschubert/clustermq|this]] GitHub Page.+One of our users has developed an R package to send function calls as jobs on Slurm via SSH. You can find the documentation and installation instructions on [[https://github.com/mschubert/clustermq|this]] GitHub Page.