Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:examples:llms [2025/07/28 11:23] bobhabrok:examples:llms [2026/02/26 13:23] (current) fokke
Line 1: Line 1:
  ===== Running LLMs =====  ===== Running LLMs =====
  
-If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. +If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. Note that the versions are recent as of 26 February 2026.
  
-1. Login with your account on Habrok. +==== Installation ====
-<code>ssh pnumber@login1.hb.hpc.rug.nl</code>+
  
-2Start an interactive job on an A100 node (single GPU): +1Login with your account on Habrok on an interactive node for the installation procedure. 
-<code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code>+<code>ssh pnumber@interactive1.hb.hpc.rug.nl</code>
  
-3. Load the Python and CUDA modules+2. Since the vllm installation packages require a newer glibc than our operating system provides, we will switch to the EESSI software stack. This provides a compatability layer with a newer glibc. 
-<code>module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1</code>+<code>module load EESSI/2025.06</code> 
 + 
 +3. Load the Python module in the version you would like to use
 +<code>module load Python/3.13.5-GCCcore-14.3.0</code>
  
 4. Create a virtual environment (only once): 4. Create a virtual environment (only once):
Line 18: Line 20:
 <code>source .env/bin/activate</code> <code>source .env/bin/activate</code>
  
-6. Upgrade ''pip'' (optional): +6. Upgrade ''pip'' and ''wheel'' (optional): 
-<code>pip install --upgrade pip</code>+<code>pip install --upgrade pip wheel</code>
  
 7. Install ''vllm'' (you can also specify a version): 7. Install ''vllm'' (you can also specify a version):
Line 25: Line 27:
 Might take a bit the first time. Might take a bit the first time.
  
-8. Run ''vllm'' with the appropriate parameters (these are some examples):+==== Running through an interactive job ==== 
 + 
 +1. Start an interactive job on an A100 node (single GPU) to be able to run the software: 
 +<code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> 
 + 
 +2. Switch to the EESSI software stack 
 +<code>module load EESSI/2025.06</code> 
 + 
 +3. Load the Python module you used for installation 
 +<code>module load Python/3.13.5-GCCcore-14.3.0</code> 
 + 
 +4. Activate the venv you created earlier: 
 +<code>source .env/bin/activate</code> 
 + 
 +5. Run ''vllm'' with the appropriate parameters (these are some examples):
 <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code>
 explanations of some of the parameters: explanations of some of the parameters: