Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habrok:examples:llms [2026/02/26 12:47] fokkehabrok:examples:llms [2026/02/26 13:23] (current) fokke
Line 2: Line 2:
  
 If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. Note that the versions are recent as of 26 February 2026. If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. Note that the versions are recent as of 26 February 2026.
 +
 +==== Installation ====
  
 1. Login with your account on Habrok on an interactive node for the installation procedure. 1. Login with your account on Habrok on an interactive node for the installation procedure.
Line 25: Line 27:
 Might take a bit the first time. Might take a bit the first time.
  
-8. Start an interactive job on an A100 node (single GPU) to be able to run the software:+==== Running through an interactive job ==== 
 + 
 +1. Start an interactive job on an A100 node (single GPU) to be able to run the software:
 <code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> <code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code>
  
-9. Run ''vllm'' with the appropriate parameters (these are some examples):+2. Switch to the EESSI software stack 
 +<code>module load EESSI/2025.06</code> 
 + 
 +3. Load the Python module you used for installation 
 +<code>module load Python/3.13.5-GCCcore-14.3.0</code> 
 + 
 +4. Activate the venv you created earlier: 
 +<code>source .env/bin/activate</code> 
 + 
 +5. Run ''vllm'' with the appropriate parameters (these are some examples):
 <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code>
 explanations of some of the parameters: explanations of some of the parameters: