Blog

RAG (Retrieval-Augmented Generation)

2025-11-30

RAG Retrieval-Augmented Generation

Hardware

  • GeForce RTX 5090 (32GB VRAM)
  • AMD 9
  • 96 GB RAM

Software

  • Ubuntu 24
  • nvidia-smi

System Services

  • jupyterlab.service
  • vllm.service

Starting/Stopping

systemctl status [service-name]

Models

llama3-8b-instruct

Setup

VLLM Service Setup

Configuration

  • create system user

    sudo useradd --system --create-home --home-dir /opt/vllm --shell /usr/sbin/nologin vllm
    

    To log in as this user:

    1. Add local user to the group:

      sudo usermod -aG vllm $USER
      
    2. Become the vLLM user (for setup):

      sudo -u vllm -s /bin/bash
      
  • application directory

    /opt/vllm/
    
  • service

    /etc/systemd/system/vllm.service
    
    [Unit]
    Description=vLLM OpenAI-compatible server
    After=network.target

    [Service]
    Type=simple
    User=vllm
    Group=vllm
    WorkingDirectory=/opt/vllm
    ExecStart=/opt/vllm/start_vllm.sh
    Restart=on-failure
    RestartSec=5

    LimitNOFILE=65535

    [Install]
    WantedBy=multi-user.target