Differences

This shows you the differences between two versions of the page.

Link to this comparison view

habrok:examples:llms [2025/03/05 12:06] – created camarocicohabrok:examples:llms [2025/03/05 12:26] (current) camarocico
Line 1: Line 1:
-===== Running LLMs =====+ ===== Running LLMs =====
  
 If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it.  If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. 
  
-  - Login with your account on Habrok (obviously).<code>ssh pnumber@login1.hb.hpc.rug.nl</code>+1. Login with your account on Habrok. 
 +<code>ssh pnumber@login1.hb.hpc.rug.nl</code> 
 2. Start an interactive job on an A100 node (single GPU): 2. Start an interactive job on an A100 node (single GPU):
-   ```bash +<code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> 
-   srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash +
-   ```+
 3. Load the Python and CUDA modules: 3. Load the Python and CUDA modules:
-   ```bash +<code>module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1</code> 
-   module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1 +
-   ```+
 4. Create a virtual environment (only once): 4. Create a virtual environment (only once):
-   ```bash +<code>python3 -m venv .env</code> 
-   python3 -m venv .env +
-   ```+
 5. Activate the venv: 5. Activate the venv:
-   ```bash +<code>source .env/bin/activate</code> 
-   source .env/bin/activate + 
-   ``` +6. Upgrade ''pip'' (optional): 
-6. Upgrade `pip(optional): +<code>pip install --upgrade pip</code> 
-   ```bash + 
-   pip install --upgrade pip +7. Install ''vllm'' (you can also specify a version): 
-   ```+<code>pip install vllm</code> 
 +Might take a bit the first time. 
 + 
 +8. Run ''vllm'' with the appropriate parameters (these are some examples): 
 +<code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> 
 +explanations of some of the parameters: 
 +     * ''HF_HOME'': since the models can be large, this downloads them to the local disc on the particular GPU node that the model is running 
 +     * The model is ''neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16'', and other models should be possible; there may be issues with the compute GPU capability that some models require, and which might not be availalbe on Habrok 
 +     * ''download-dir'': this may be the same as ''HF_HOME'' 
 +     * ''port'': You can specify whatever port you want 
 + 
 +Once ''vllm'' is up and running, take note of the node it is running on (e.g. ''a100gpu6''), and then forward the appropriate port to your local machine: 
 +<code>ssh -NL 8192:a100gpu6:8192 pnumber@login1.hb.hpc.rug.nl</code> 
 + 
 +You can the test that it is working with: 
 +<code>curl -X GET localhost:8192/v1/models</code> 
 + 
 +and you should get something like: 
 + 
 +<code>{"object":"list","data":[{"id":"neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16","object":"model","created":1729006332,"owned_by":"vllm","root":"neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16","parent":null,"max_model_len":1024,"permission":[{"id":"modelperm-13c3464597dc45dd9b661847a0343f39","object":"model_permission","created":1729006332,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}</code> 
 + 
 +or you can go to ''http://localhost:8192/v1/models'' and get the same ''json'':
  
 +<code>{
 +  "object": "list",
 +  "data": [
 +    {
 +      "id": "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",
 +      "object": "model",
 +      "created": 1729006479,
 +      "owned_by": "vllm",
 +      "root": "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16",
 +      "parent": null,
 +      "max_model_len": 1024,
 +      "permission": [
 +        {
 +          "id": "modelperm-5c65faf9419446fb94c80c2d669056c4",
 +          "object": "model_permission",
 +          "created": 1729006479,
 +          "allow_create_engine": false,
 +          "allow_sampling": true,
 +          "allow_logprobs": true,
 +          "allow_search_indices": false,
 +          "allow_view": true,
 +          "allow_fine_tuning": false,
 +          "organization": "*",
 +          "group": null,
 +          "is_blocking": false
 +        }
 +      ]
 +    }
 +  ]
 +}</code>