| |
habrok:examples:llms [2025/03/05 12:06] – created camarocico | habrok:examples:llms [2025/03/05 12:26] (current) – camarocico |
---|
===== Running LLMs ===== | ===== Running LLMs ===== |
| |
If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. | If you want to run a Large Language Model (LLM) on Habrok, here's one possible and relatively easy way to do it. |
| |
- Login with your account on Habrok (obviously).<code>ssh pnumber@login1.hb.hpc.rug.nl</code> | 1. Login with your account on Habrok. |
| <code>ssh pnumber@login1.hb.hpc.rug.nl</code> |
2. Start an interactive job on an A100 node (single GPU): | 2. Start an interactive job on an A100 node (single GPU): |
```bash | <code>srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash</code> |
srun --nodes=1 --ntasks=1 --partition=gpushort --mem=120G --time=04:00:00 --gres=gpu:a100:1 --pty bash | |
``` | |
3. Load the Python and CUDA modules: | 3. Load the Python and CUDA modules: |
```bash | <code>module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1</code> |
module load Python/3.11.5-GCCcore-13.2.0 CUDA/12.1.1 | |
``` | |
4. Create a virtual environment (only once): | 4. Create a virtual environment (only once): |
```bash | <code>python3 -m venv .env</code> |
python3 -m venv .env | |
``` | |
5. Activate the venv: | 5. Activate the venv: |
```bash | <code>source .env/bin/activate</code> |
source .env/bin/activate | |
``` | 6. Upgrade ''pip'' (optional): |
6. Upgrade `pip` (optional): | <code>pip install --upgrade pip</code> |
```bash | |
pip install --upgrade pip | 7. Install ''vllm'' (you can also specify a version): |
``` | <code>pip install vllm</code> |
| Might take a bit the first time. |
| |
| 8. Run ''vllm'' with the appropriate parameters (these are some examples): |
| <code>export HF_HOME=/tmp && vllm serve neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 --download-dir /tmp/models --max-model-len 1024 --gpu-memory-utilization 0.95 --port 8192</code> |
| explanations of some of the parameters: |
| * ''HF_HOME'': since the models can be large, this downloads them to the local disc on the particular GPU node that the model is running |
| * The model is ''neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16'', and other models should be possible; there may be issues with the compute GPU capability that some models require, and which might not be availalbe on Habrok |
| * ''download-dir'': this may be the same as ''HF_HOME'' |
| * ''port'': You can specify whatever port you want |
| |
| Once ''vllm'' is up and running, take note of the node it is running on (e.g. ''a100gpu6''), and then forward the appropriate port to your local machine: |
| <code>ssh -NL 8192:a100gpu6:8192 pnumber@login1.hb.hpc.rug.nl</code> |
| |
| You can the test that it is working with: |
| <code>curl -X GET localhost:8192/v1/models</code> |
| |
| and you should get something like: |
| |
| <code>{"object":"list","data":[{"id":"neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16","object":"model","created":1729006332,"owned_by":"vllm","root":"neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16","parent":null,"max_model_len":1024,"permission":[{"id":"modelperm-13c3464597dc45dd9b661847a0343f39","object":"model_permission","created":1729006332,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}</code> |
| |
| or you can go to ''http://localhost:8192/v1/models'' and get the same ''json'': |
| |
| <code>{ |
| "object": "list", |
| "data": [ |
| { |
| "id": "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16", |
| "object": "model", |
| "created": 1729006479, |
| "owned_by": "vllm", |
| "root": "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16", |
| "parent": null, |
| "max_model_len": 1024, |
| "permission": [ |
| { |
| "id": "modelperm-5c65faf9419446fb94c80c2d669056c4", |
| "object": "model_permission", |
| "created": 1729006479, |
| "allow_create_engine": false, |
| "allow_sampling": true, |
| "allow_logprobs": true, |
| "allow_search_indices": false, |
| "allow_view": true, |
| "allow_fine_tuning": false, |
| "organization": "*", |
| "group": null, |
| "is_blocking": false |
| } |
| ] |
| } |
| ] |
| }</code> |