Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:examples:llms [2025/03/05 12:26] camarocicohabrok:examples:llms [2026/02/26 13:23] (current) fokke
Line 1: Line 1:
  ===== Running LLMs =====  ===== Running LLMs =====
  
-If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. +If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. Note that the versions are recent as of 26 February 2026.
  
-1. Login with your account on Habrok. +==== Installation ====
-<code>ssh pnumber@login1.hb.hpc.rug.nl</code>+
  
-2Start an interactive job on an A100 node (single GPU): +1Login with your account on Habrok on an interactive node for the installation procedure. 
-<code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code>+<code>ssh pnumber@interactive1.hb.hpc.rug.nl</code> 
 + 
 +2. Since the vllm installation packages require a newer glibc than our operating system provides, we will switch to the EESSI software stack. This provides a compatability layer with a newer glibc. 
 +<code>module load EESSI/2025.06</code>
  
-3. Load the Python and CUDA modules+3. Load the Python module in the version you would like to use
-<code>module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1</code>+<code>module load Python/3.13.5-GCCcore-14.3.0</code>
  
 4. Create a virtual environment (only once): 4. Create a virtual environment (only once):
Line 18: Line 20:
 <code>source .env/bin/activate</code> <code>source .env/bin/activate</code>
  
-6. Upgrade ''pip'' (optional): +6. Upgrade ''pip'' and ''wheel'' (optional): 
-<code>pip install --upgrade pip</code>+<code>pip install --upgrade pip wheel</code>
  
 7. Install ''vllm'' (you can also specify a version): 7. Install ''vllm'' (you can also specify a version):
Line 25: Line 27:
 Might take a bit the first time. Might take a bit the first time.
  
-8. Run ''vllm'' with the appropriate parameters (these are some examples):+==== Running through an interactive job ==== 
 + 
 +1. Start an interactive job on an A100 node (single GPU) to be able to run the software: 
 +<code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> 
 + 
 +2. Switch to the EESSI software stack 
 +<code>module load EESSI/2025.06</code> 
 + 
 +3. Load the Python module you used for installation 
 +<code>module load Python/3.13.5-GCCcore-14.3.0</code> 
 + 
 +4. Activate the venv you created earlier: 
 +<code>source .env/bin/activate</code> 
 + 
 +5. Run ''vllm'' with the appropriate parameters (these are some examples):
 <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code>
 explanations of some of the parameters: explanations of some of the parameters:
Line 75: Line 91:
   ]   ]
 }</code> }</code>
 +
 +==== Running Ollama in a jobscript ====
 +
 +The following code can be used in a jobscript to run an Ollama model:
 +
 +<code>
 +# Load the Ollama module
 +# GPU node
 +module load ollama/0.6.0-GCCcore-12.3.0-CUDA-12.1.1
 +# CPU node
 +# module load ollama/0.6.0-GCCcore-12.3.0
 +
 +# Use /scratch for storing models
 +export OLLAMA_MODELS=/scratch/$USER/ollama/models
 +
 +# Start the Ollama server in the background, log all its output to ollama-serve.log
 +ollama serve >& ollama-serve.log &
 +# Wait a few seconds to make sure that the server has started
 +sleep 5
 +
 +# Run the model
 +echo "Tell me something about Groningen" | ollama run deepseek-r1:14b
 +
 +# Kill the server process
 +pkill -u $USER ollama
 +</code>