You’ve got Ollama running. You’re pulling models, chatting through curl commands in your terminal. It works, but let’s be honest — typing JSON into a terminal isn’t exactly the ChatGPT experience.

Open WebUI (formerly Ollama WebUI) gives you a polished, feature-rich chat interface for your local models. It looks and feels like ChatGPT, but everything runs on your hardware. Conversations stay private. Models run offline. And it’s packed with features that even OpenAI’s interface doesn’t have.

In this guide, we’ll get Open WebUI running with Docker, connect it to Ollama, and explore the features that make it the best self-hosted AI chat interface in 2026.

Why Open WebUI?

If you’re already running Ollama (check our Ollama setup guide if you’re not), Open WebUI adds:

  • Beautiful chat interface — Multiple conversations, markdown rendering, code highlighting
  • Multi-model support — Switch between models mid-conversation
  • RAG (Retrieval Augmented Generation) — Upload documents and chat with them
  • Image generation — Integrate with Stable Diffusion or DALL-E
  • User management — Multiple accounts with role-based access
  • Model customization — Create custom personas with system prompts
  • Chat history — Full searchable conversation history
  • Voice input/output — Speech-to-text and text-to-speech
  • API compatibility — Works with Ollama, OpenAI, and any OpenAI-compatible API

Prerequisites

Before starting, make sure you have:

  • A server running Linux with Docker and Docker Compose
  • Ollama installed and running (or any OpenAI-compatible API)
  • At least 1GB free RAM for Open WebUI itself (plus whatever your models need)
  • A modern web browser

Don’t have Ollama yet? Follow our Ollama guide first, then come back here.

Quick Start: Open WebUI + Ollama

If Ollama is already running on the same machine:

mkdir -p ~/docker/open-webui
cd ~/docker/open-webui

Create docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open-webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui_data:

Start it up:

docker compose up -d

Open http://your-server-ip:3000 and you’ll see the registration page. The first account you create becomes the admin.

All-in-One: Open WebUI + Ollama Together

If you want both services managed in a single compose file:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open-webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama_data:
  open-webui_data:

This is the cleanest setup — both containers on the same Docker network, no host networking tricks needed.

GPU Acceleration

If your server has an NVIDIA GPU, add the GPU runtime to the Ollama container:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Make sure you have the NVIDIA Container Toolkit installed on the host.

For AMD GPUs, use the ROCm Ollama image instead:

    image: ollama/ollama:rocm

First-Time Setup

1. Create Your Admin Account

The first user to register becomes the admin. Navigate to http://your-server-ip:3000 and create an account. After that, you can control whether new registrations are allowed in Admin Panel > Settings.

2. Pull Your First Model

Open WebUI can pull models directly from the interface. Go to Admin Panel > Models and pull a model:

  • llama3.2:3b — Fast, good for general chat (2GB)
  • llama3.1:8b — Better quality, still fast on modern hardware (4.7GB)
  • mistral:7b — Excellent coding and reasoning (4.1GB)
  • gemma2:9b — Google’s model, great for creative writing (5.4GB)
  • codellama:13b — Specialized for code generation (7.4GB)

Or pull from the terminal:

docker exec ollama ollama pull llama3.2:3b

3. Start Chatting

Click New Chat, select your model from the dropdown, and start typing. The interface supports:

  • Markdown rendering — Tables, code blocks, lists all render properly
  • Code highlighting — Syntax-highlighted code with copy buttons
  • LaTeX — Mathematical equations render correctly
  • Streaming — Token-by-token response streaming

Key Features Deep Dive

RAG: Chat With Your Documents

One of Open WebUI’s standout features is document-based chat. Upload PDFs, text files, or entire folders and ask questions about their contents.

  1. Click the + icon in the chat input
  2. Upload a document (PDF, TXT, DOCX, CSV)
  3. The document gets chunked and embedded automatically
  4. Ask questions — the model references your uploaded content

This is incredibly useful for:

  • Querying technical documentation
  • Analyzing research papers
  • Searching through meeting notes
  • Reviewing contracts or legal documents

Custom Model Personas

Create specialized assistants with custom system prompts:

  1. Go to Workspace > Models
  2. Click Create a Model
  3. Set a name, description, and system prompt
  4. Choose the base model

Example personas:

  • Code Reviewer — “You are a senior software engineer. Review code for bugs, security issues, and best practices.”
  • Writing Editor — “You are a professional editor. Improve clarity, grammar, and flow while preserving the author’s voice.”
  • Homelab Advisor — “You are a self-hosting expert. Help users choose and configure services for their home server.”

Multi-Model Conversations

Open WebUI lets you compare models side by side. In a conversation, you can:

  • Switch models between messages
  • Regenerate a response with a different model
  • See which model generated each response

This is great for evaluating which model works best for your use case.

Web Search Integration

Enable web search to let your models access current information:

  1. Go to Admin Panel > Settings > Web Search
  2. Configure a search provider (SearXNG recommended for self-hosters)
  3. Models can now search the web when they need current information

For a fully self-hosted setup, run SearXNG alongside Open WebUI:

  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    restart: unless-stopped
    ports:
      - "8888:8080"
    volumes:
      - ./searxng:/etc/searxng

Then point Open WebUI’s web search to http://searxng:8080.

Connecting to OpenAI (Optional)

Open WebUI can also connect to OpenAI’s API, giving you GPT-4 and other cloud models alongside your local ones:

  1. Go to Admin Panel > Settings > Connections
  2. Add your OpenAI API key
  3. Both local (Ollama) and cloud (OpenAI) models appear in the model selector

This gives you the best of both worlds — private local models for sensitive work, cloud models when you need maximum capability.

Running Behind a Reverse Proxy

Caddy

c}hat.ryeovuerrdsoem_apirno.xcyomop{en-webui:8080

Nginx

server {
    listen 80;
    server_name chat.yourdomain.com;

    client_max_body_size 50M;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for streaming
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Important: The WebSocket headers are required for response streaming. Without them, you’ll get responses all at once instead of token-by-token.

Environment Variables Reference

VariableDefaultDescription
OLLAMA_BASE_URLhttp://localhost:11434Ollama API endpoint
OPENAI_API_KEYOpenAI API key (optional)
WEBUI_AUTHtrueEnable authentication
WEBUI_NAMEOpen WebUICustom instance name
ENABLE_SIGNUPtrueAllow new user registration
DEFAULT_MODELSDefault model for new chats
ENABLE_RAG_WEB_SEARCHfalseEnable web search in RAG
RAG_EMBEDDING_MODELsentence-transformers/all-MiniLM-L6-v2Embedding model for RAG

Performance Tips

Memory Management

Open WebUI itself uses about 500MB-1GB RAM. The real memory consumer is Ollama and the models:

  • 3B models — 2-4GB RAM
  • 7-8B models — 4-8GB RAM
  • 13B models — 8-16GB RAM
  • 70B models — 40GB+ RAM (GPU recommended)

If memory is tight, configure Ollama to unload models after a timeout:

OLLAMA_KEEP_ALIVE=5m  # Unload model after 5 minutes of inactivity

Faster Responses

  1. Use quantized models (Q4_K_M is a good balance of speed and quality)
  2. Enable GPU acceleration if available
  3. Use smaller models for simple tasks (3B for quick questions, 8B+ for complex reasoning)
  4. Set OLLAMA_NUM_PARALLEL=2 for concurrent users

Updating

cd ~/docker/open-webui
docker compose pull
docker compose up -d

Open WebUI updates frequently with new features. Your conversations and settings persist in the volume.

Troubleshooting

“Could not connect to Ollama”

  • Verify Ollama is running: curl http://localhost:11434/api/tags
  • Check the OLLAMA_BASE_URL environment variable
  • If using Docker, make sure the extra_hosts mapping is correct

Slow responses

  • Check available RAM — if the system is swapping, models will crawl
  • Try a smaller model
  • Monitor with docker stats to see resource usage

Upload failures

  • Increase client_max_body_size in your reverse proxy
  • Check disk space in the Docker volume

Models not appearing

  • Pull models through the admin panel or CLI
  • Restart Open WebUI after pulling new models: docker compose restart open-webui

Conclusion

Open WebUI transforms your local LLM setup from a developer tool into something anyone can use. The interface is polished, the features are extensive (RAG, web search, personas, multi-model), and everything stays on your hardware.

Pair it with Ollama and a decent GPU, and you’ve got a private, capable AI assistant that rivals the cloud services — without the subscription fees or privacy concerns.

The AI category is the fastest-growing area in self-hosting, and Open WebUI is at the center of it. If you’re only going to self-host one AI tool, make it this one.

Useful links: