RAG Retrieval-Augmented Generation

Hardware

GeForce RTX 5090 (32GB VRAM)
AMD 9
96 GB RAM

Software

Ubuntu 24
nvidia-smi

System Services

jupyterlab.service
vllm.service

Starting/Stopping

systemctl status [service-name]

Models

llama3-8b-instruct

Setup

VLLM Service Setup

Configuration

create system user

sudo useradd --system --create-home --home-dir /opt/vllm --shell /usr/sbin/nologin vllm

To log in as this user:

Add local user to the group:
```
sudo usermod -aG vllm $USER
```
Become the vLLM user (for setup):
```
sudo -u vllm -s /bin/bash
```

application directory
```
/opt/vllm/
```
service
```
/etc/systemd/system/vllm.service
```

    [Unit]
    Description=vLLM OpenAI-compatible server
    After=network.target

    [Service]
    Type=simple
    User=vllm
    Group=vllm
    WorkingDirectory=/opt/vllm
    ExecStart=/opt/vllm/start_vllm.sh
    Restart=on-failure
    RestartSec=5

    LimitNOFILE=65535

    [Install]
    WantedBy=multi-user.target

RAG (Retrieval-Augmented Generation)

RAG Retrieval-Augmented Generation

Hardware

Software

System Services

Starting/Stopping

Models

Setup

VLLM Service Setup

Configuration