This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |
| habrok:examples:llms [2026/02/26 12:49] – fokke | habrok:examples:llms [2026/02/26 13:23] (current) – fokke |
|---|
| <code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> | <code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> |
| |
| 2. Load the Python module you used for installation | 2. Switch to the EESSI software stack |
| | <code>module load EESSI/2025.06</code> |
| | |
| | 3. Load the Python module you used for installation |
| <code>module load Python/3.13.5-GCCcore-14.3.0</code> | <code>module load Python/3.13.5-GCCcore-14.3.0</code> |
| |
| 3. Activate the venv you created earlier: | 4. Activate the venv you created earlier: |
| <code>source .env/bin/activate</code> | <code>source .env/bin/activate</code> |
| |
| 4. Run ''vllm'' with the appropriate parameters (these are some examples): | 5. Run ''vllm'' with the appropriate parameters (these are some examples): |
| <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> | <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> |
| explanations of some of the parameters: | explanations of some of the parameters: |