Every household drowns in paper. Tax documents, medical records, receipts, warranties, letters — they pile up in drawers and filing cabinets until you need one and can’t find it.

Paperless-ngx fixes this permanently. Scan or photograph a document, drop it in a folder, and Paperless automatically OCRs it, extracts the text, tags it, and makes it searchable. Finding any document takes seconds instead of minutes.

Why Paperless-ngx?

  • Full-text search across every document you’ve ever scanned
  • Automatic OCR — extracts text from scanned images and PDFs
  • Smart tagging — learns your patterns and auto-categorizes
  • Correspondent detection — knows who sent what
  • Multiple file formats — PDF, PNG, JPEG, TIFF, even Office documents
  • Mobile-friendly web UI for access anywhere on your network

Prerequisites

  • Docker and Docker Compose installed
  • At least 2GB RAM (OCR is memory-hungry)
  • Storage space for your documents (plan ~5MB per page average)

Docker Compose Setup

Create a directory and compose file:

mkdir -p ~/paperless-ngx && cd ~/paperless-ngx
# docker-compose.yml
version: "3.8"

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redis_data:/data

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: changeme_db_password

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBPASS: changeme_db_password
      PAPERLESS_SECRET_KEY: changeme_long_random_string
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_TIME_ZONE: America/New_York
      PAPERLESS_URL: https://paperless.yourdomain.com
      USERMAP_UID: 1000
      USERMAP_GID: 1000

volumes:
  data:
  media:
  pgdata:
  redis_data:

Important: Change changeme_db_password and changeme_long_random_string to actual random strings. Generate them with:

openssl rand -hex 32

Start It Up

docker compose up -d

Wait about 30 seconds for everything to initialize, then create your admin account:

docker compose exec webserver python3 manage.py createsuperuser

Access the web UI at http://your-server:8000.

How the Consume Folder Works

Paperless watches the ./consume directory. Any file you drop in there gets:

  1. Imported into the document store
  2. OCR’d to extract all text
  3. Auto-tagged based on your rules
  4. Removed from the consume folder (original stored in media)

This is the magic. Set up your scanner to save directly to this folder, and documents process themselves.

Setting Up a Scanner Workflow

Option 1: Network Scanner

If your scanner supports scan-to-folder (most modern ones do), point it at the consume directory via a network share:

# Install Samba for network sharing
sudo apt install samba

# Add consume folder to Samba config
sudo tee -a /etc/samba/smb.conf << 'EOF'
[paperless]
  path = /home/youruser/paperless-ngx/consume
  browseable = yes
  writable = yes
  valid users = youruser
EOF

sudo systemctl restart smbd

Option 2: Mobile Scanning

Use any mobile scanning app (Adobe Scan, Microsoft Lens, or Genius Scan) and save to a cloud folder that syncs to your server. Or use the Paperless-ngx mobile app directly.

Option 3: Email Import

Paperless can fetch documents from an email account:

environment:
  PAPERLESS_EMAIL_HOST: imap.gmail.com
  PAPERLESS_EMAIL_PORT: 993
  PAPERLESS_EMAIL_USERNAME: [email protected]
  PAPERLESS_EMAIL_PASSWORD: your-app-password

Forward receipts and documents to a dedicated email address and they’ll appear in Paperless automatically.

Organizing with Tags and Correspondents

Tags

Create tags for document categories:

  • tax — anything tax-related
  • medical — health records
  • receipt — purchase receipts
  • warranty — product warranties
  • insurance — policies and claims
  • vehicle — car-related documents

Correspondents

Paperless tracks who sent documents. After a few manual assignments, it learns patterns:

  • Amazon → receipts
  • IRS → tax documents
  • Your doctor’s name → medical

Document Types

Categorize by type:

  • Invoice
  • Receipt
  • Letter
  • Contract
  • Statement

Automatic Matching Rules

The real power is in matching rules. Go to Settings → Matching and create rules:

  • Any document containing “Invoice” → tag: invoice
  • Any document from “Blue Cross” → tag: medical, correspondent: Blue Cross
  • Any document containing “W-2” → tag: tax

After setting up 10-15 rules, most documents tag themselves correctly.

Searching Your Documents

The search is incredibly powerful:

  • Full text: search for any word in any document
  • Filters: combine tags, correspondents, date ranges
  • ASN (Archive Serial Number): physical filing reference

Type “water bill 2025” and instantly find every water bill from last year. This alone is worth the setup.

Backup Strategy

Your documents are precious. Back them up:

# Export all documents with metadata
docker compose exec webserver document_exporter ../export

# The export folder now contains everything needed to rebuild

Add this to a cron job:

0 2 * * * cd ~/paperless-ngx && docker compose exec -T webserver document_exporter ../export

Then back up the ./export directory with your normal backup tool (restic, rclone, etc.).

Performance Tuning

OCR Speed

OCR is the bottleneck. Speed it up:

environment:
  PAPERLESS_TASK_WORKERS: 2        # Parallel processing (default: 1)
  PAPERLESS_OCR_PAGES: 0           # 0 = all pages (limit for speed)
  PAPERLESS_OCR_SKIP_ARCHIVE_FILE: with_text  # Skip if already has text

Storage

Documents add up. Monitor usage:

docker compose exec webserver python3 manage.py document_stats

Plan for roughly 5MB per page for scanned documents, much less for native PDFs.

Putting It Behind a Reverse Proxy

For remote access, put Paperless behind your reverse proxy. Nginx example:

server {
    server_name paperless.yourdomain.com;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        client_max_body_size 100M;  # Allow large document uploads
    }
}

The client_max_body_size is important — without it, large scans will fail to upload.

The Payoff

Once you’ve been running Paperless for a month, the benefits compound:

  • Tax season: search “W-2 2025” and it’s right there
  • Warranty claim: search the product name, find the receipt instantly
  • Insurance: every policy, claim, and EOB searchable in seconds
  • Moving/legal: all your important documents in one searchable place

The initial scanning effort is the hardest part. Set aside a weekend to scan your paper backlog, then maintain it by scanning new documents as they arrive. Within a month, you’ll wonder how you ever lived without it.

Wrapping Up

Paperless-ngx is one of those self-hosted tools that genuinely improves your daily life. It replaces filing cabinets, makes tax prep painless, and ensures you never lose an important document again.

The Docker setup takes five minutes. The scanning habit takes a weekend to build. The time saved lasts forever.