Running Open WebUI: ChatGPT Interface for Your Local Models

You’ve got Ollama running. You’re pulling models, chatting through curl commands in your terminal. It works, but let’s be honest — typing JSON into a terminal isn’t exactly the ChatGPT experience.

Open WebUI (formerly Ollama WebUI) gives you a polished, feature-rich chat interface for your local models. It looks and feels like ChatGPT, but everything runs on your hardware. Conversations stay private. Models run offline. And it’s packed with features that even OpenAI’s interface doesn’t have.

In this guide, we’ll get Open WebUI running with Docker, connect it to Ollama, and explore the features that make it the best self-hosted AI chat interface in 2026.

Why Open WebUI?

If you’re already running Ollama (check our Ollama setup guide if you’re not), Open WebUI adds:

Beautiful chat interface — Multiple conversations, markdown rendering, code highlighting
Multi-model support — Switch between models mid-conversation
RAG (Retrieval Augmented Generation) — Upload documents and chat with them
Image generation — Integrate with Stable Diffusion or DALL-E
User management — Multiple accounts with role-based access
Model customization — Create custom personas with system prompts
Chat history — Full searchable conversation history
Voice input/output — Speech-to-text and text-to-speech
API compatibility — Works with Ollama, OpenAI, and any OpenAI-compatible API

Prerequisites

Before starting, make sure you have:

A server running Linux with Docker and Docker Compose
Ollama installed and running (or any OpenAI-compatible API)
At least 1GB free RAM for Open WebUI itself (plus whatever your models need)
A modern web browser

Don’t have Ollama yet? Follow our Ollama guide first, then come back here.

Quick Start: Open WebUI + Ollama

If Ollama is already running on the same machine:

mkdir -p ~/docker/open-webui
cd ~/docker/open-webui

Create docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open-webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui_data:

Start it up:

docker compose up -d

Open http://your-server-ip:3000 and you’ll see the registration page. The first account you create becomes the admin.

All-in-One: Open WebUI + Ollama Together

If you want both services managed in a single compose file:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open-webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama_data:
  open-webui_data:

This is the cleanest setup — both containers on the same Docker network, no host networking tricks needed.

GPU Acceleration

If your server has an NVIDIA GPU, add the GPU runtime to the Ollama container:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Make sure you have the NVIDIA Container Toolkit installed on the host.

For AMD GPUs, use the ROCm Ollama image instead:

    image: ollama/ollama:rocm

First-Time Setup

1. Create Your Admin Account

The first user to register becomes the admin. Navigate to http://your-server-ip:3000 and create an account. After that, you can control whether new registrations are allowed in Admin Panel > Settings.

2. Pull Your First Model

Open WebUI can pull models directly from the interface. Go to Admin Panel > Models and pull a model:

llama3.2:3b — Fast, good for general chat (2GB)
llama3.1:8b — Better quality, still fast on modern hardware (4.7GB)
mistral:7b — Excellent coding and reasoning (4.1GB)
gemma2:9b — Google’s model, great for creative writing (5.4GB)
codellama:13b — Specialized for code generation (7.4GB)

Or pull from the terminal:

docker exec ollama ollama pull llama3.2:3b

3. Start Chatting

Click New Chat, select your model from the dropdown, and start typing. The interface supports:

Markdown rendering — Tables, code blocks, lists all render properly
Code highlighting — Syntax-highlighted code with copy buttons
LaTeX — Mathematical equations render correctly
Streaming — Token-by-token response streaming

Key Features Deep Dive

RAG: Chat With Your Documents

One of Open WebUI’s standout features is document-based chat. Upload PDFs, text files, or entire folders and ask questions about their contents.

Click the + icon in the chat input
Upload a document (PDF, TXT, DOCX, CSV)
The document gets chunked and embedded automatically
Ask questions — the model references your uploaded content

This is incredibly useful for:

Querying technical documentation
Analyzing research papers
Searching through meeting notes
Reviewing contracts or legal documents

Custom Model Personas

Create specialized assistants with custom system prompts:

Go to Workspace > Models
Click Create a Model
Set a name, description, and system prompt
Choose the base model

Example personas:

Code Reviewer — “You are a senior software engineer. Review code for bugs, security issues, and best practices.”
Writing Editor — “You are a professional editor. Improve clarity, grammar, and flow while preserving the author’s voice.”
Homelab Advisor — “You are a self-hosting expert. Help users choose and configure services for their home server.”

Multi-Model Conversations

Open WebUI lets you compare models side by side. In a conversation, you can:

Switch models between messages
Regenerate a response with a different model
See which model generated each response

This is great for evaluating which model works best for your use case.

Web Search Integration

Enable web search to let your models access current information:

Go to Admin Panel > Settings > Web Search
Configure a search provider (SearXNG recommended for self-hosters)
Models can now search the web when they need current information

For a fully self-hosted setup, run SearXNG alongside Open WebUI:

  searxng:
    image: searxng/searxng:latest
    container_name: searxng
    restart: unless-stopped
    ports:
      - "8888:8080"
    volumes:
      - ./searxng:/etc/searxng

Then point Open WebUI’s web search to http://searxng:8080.

Connecting to OpenAI (Optional)

Open WebUI can also connect to OpenAI’s API, giving you GPT-4 and other cloud models alongside your local ones:

Go to Admin Panel > Settings > Connections
Add your OpenAI API key
Both local (Ollama) and cloud (OpenAI) models appear in the model selector

This gives you the best of both worlds — private local models for sensitive work, cloud models when you need maximum capability.

Running Behind a Reverse Proxy

Caddy

Nginx

server {
    listen 80;
    server_name chat.yourdomain.com;

    client_max_body_size 50M;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for streaming
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Important: The WebSocket headers are required for response streaming. Without them, you’ll get responses all at once instead of token-by-token.

Environment Variables Reference

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama API endpoint
`OPENAI_API_KEY`	—	OpenAI API key (optional)
`WEBUI_AUTH`	`true`	Enable authentication
`WEBUI_NAME`	`Open WebUI`	Custom instance name
`ENABLE_SIGNUP`	`true`	Allow new user registration
`DEFAULT_MODELS`	—	Default model for new chats
`ENABLE_RAG_WEB_SEARCH`	`false`	Enable web search in RAG
`RAG_EMBEDDING_MODEL`	`sentence-transformers/all-MiniLM-L6-v2`	Embedding model for RAG

Performance Tips

Memory Management

Open WebUI itself uses about 500MB-1GB RAM. The real memory consumer is Ollama and the models:

3B models — 2-4GB RAM
7-8B models — 4-8GB RAM
13B models — 8-16GB RAM
70B models — 40GB+ RAM (GPU recommended)

If memory is tight, configure Ollama to unload models after a timeout:

OLLAMA_KEEP_ALIVE=5m  # Unload model after 5 minutes of inactivity

Faster Responses

Use quantized models (Q4_K_M is a good balance of speed and quality)
Enable GPU acceleration if available
Use smaller models for simple tasks (3B for quick questions, 8B+ for complex reasoning)
Set OLLAMA_NUM_PARALLEL=2 for concurrent users

Updating

cd ~/docker/open-webui
docker compose pull
docker compose up -d

Open WebUI updates frequently with new features. Your conversations and settings persist in the volume.

Troubleshooting

“Could not connect to Ollama”

Verify Ollama is running: curl http://localhost:11434/api/tags
Check the OLLAMA_BASE_URL environment variable
If using Docker, make sure the extra_hosts mapping is correct

Slow responses

Check available RAM — if the system is swapping, models will crawl
Try a smaller model
Monitor with docker stats to see resource usage

Upload failures

Increase client_max_body_size in your reverse proxy
Check disk space in the Docker volume

Models not appearing

Pull models through the admin panel or CLI
Restart Open WebUI after pulling new models: docker compose restart open-webui

Conclusion

Open WebUI transforms your local LLM setup from a developer tool into something anyone can use. The interface is polished, the features are extensive (RAG, web search, personas, multi-model), and everything stays on your hardware.

Pair it with Ollama and a decent GPU, and you’ve got a private, capable AI assistant that rivals the cloud services — without the subscription fees or privacy concerns.

The AI category is the fastest-growing area in self-hosting, and Open WebUI is at the center of it. If you’re only going to self-host one AI tool, make it this one.

Useful links:

Why Open WebUI?#

Prerequisites#

Quick Start: Open WebUI + Ollama#

All-in-One: Open WebUI + Ollama Together#

GPU Acceleration#

First-Time Setup#

1. Create Your Admin Account#

2. Pull Your First Model#

3. Start Chatting#

Key Features Deep Dive#

RAG: Chat With Your Documents#

Custom Model Personas#

Multi-Model Conversations#

Web Search Integration#

Connecting to OpenAI (Optional)#

Running Behind a Reverse Proxy#

Caddy#

Nginx#

Environment Variables Reference#

Performance Tips#

Memory Management#

Faster Responses#

Updating#

Troubleshooting#

“Could not connect to Ollama”#

Slow responses#

Upload failures#

Models not appearing#

Conclusion#

📬 Get Self-Hosting Tips in Your Inbox