RAG (Retrieval-Augmented Generation)
2025-11-30
RAG Retrieval-Augmented Generation
Hardware
- GeForce RTX 5090 (32GB VRAM)
- AMD 9
- 96 GB RAM
Software
- Ubuntu 24
- nvidia-smi
System Services
- jupyterlab.service
- vllm.service
Starting/Stopping
systemctl status [service-name]
Models
Setup
VLLM Service Setup
Configuration
-
create system user
sudo useradd --system --create-home --home-dir /opt/vllm --shell /usr/sbin/nologin vllmTo log in as this user:
-
Add local user to the group:
sudo usermod -aG vllm $USER -
Become the vLLM user (for setup):
sudo -u vllm -s /bin/bash
-
-
application directory
/opt/vllm/ -
service
/etc/systemd/system/vllm.service
[Unit]
Description=vLLM OpenAI-compatible server
After=network.target
[Service]
Type=simple
User=vllm
Group=vllm
WorkingDirectory=/opt/vllm
ExecStart=/opt/vllm/start_vllm.sh
Restart=on-failure
RestartSec=5
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target