Every household drowns in paper. Tax documents, medical records, receipts, warranties, letters — they pile up in drawers and filing cabinets until you need one and can’t find it.
Paperless-ngx fixes this permanently. Scan or photograph a document, drop it in a folder, and Paperless automatically OCRs it, extracts the text, tags it, and makes it searchable. Finding any document takes seconds instead of minutes.
Why Paperless-ngx?
- Full-text search across every document you’ve ever scanned
- Automatic OCR — extracts text from scanned images and PDFs
- Smart tagging — learns your patterns and auto-categorizes
- Correspondent detection — knows who sent what
- Multiple file formats — PDF, PNG, JPEG, TIFF, even Office documents
- Mobile-friendly web UI for access anywhere on your network
Prerequisites
- Docker and Docker Compose installed
- At least 2GB RAM (OCR is memory-hungry)
- Storage space for your documents (plan ~5MB per page average)
Docker Compose Setup
Create a directory and compose file:
mkdir -p ~/paperless-ngx && cd ~/paperless-ngx
# docker-compose.yml
version: "3.8"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redis_data:/data
db:
image: docker.io/library/postgres:16
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: changeme_db_password
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBPASS: changeme_db_password
PAPERLESS_SECRET_KEY: changeme_long_random_string
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_TIME_ZONE: America/New_York
PAPERLESS_URL: https://paperless.yourdomain.com
USERMAP_UID: 1000
USERMAP_GID: 1000
volumes:
data:
media:
pgdata:
redis_data:
Important: Change changeme_db_password and changeme_long_random_string to actual random strings. Generate them with:
openssl rand -hex 32
Start It Up
docker compose up -d
Wait about 30 seconds for everything to initialize, then create your admin account:
docker compose exec webserver python3 manage.py createsuperuser
Access the web UI at http://your-server:8000.
How the Consume Folder Works
Paperless watches the ./consume directory. Any file you drop in there gets:
- Imported into the document store
- OCR’d to extract all text
- Auto-tagged based on your rules
- Removed from the consume folder (original stored in media)
This is the magic. Set up your scanner to save directly to this folder, and documents process themselves.
Setting Up a Scanner Workflow
Option 1: Network Scanner
If your scanner supports scan-to-folder (most modern ones do), point it at the consume directory via a network share:
# Install Samba for network sharing
sudo apt install samba
# Add consume folder to Samba config
sudo tee -a /etc/samba/smb.conf << 'EOF'
[paperless]
path = /home/youruser/paperless-ngx/consume
browseable = yes
writable = yes
valid users = youruser
EOF
sudo systemctl restart smbd
Option 2: Mobile Scanning
Use any mobile scanning app (Adobe Scan, Microsoft Lens, or Genius Scan) and save to a cloud folder that syncs to your server. Or use the Paperless-ngx mobile app directly.
Option 3: Email Import
Paperless can fetch documents from an email account:
environment:
PAPERLESS_EMAIL_HOST: imap.gmail.com
PAPERLESS_EMAIL_PORT: 993
PAPERLESS_EMAIL_USERNAME: [email protected]
PAPERLESS_EMAIL_PASSWORD: your-app-password
Forward receipts and documents to a dedicated email address and they’ll appear in Paperless automatically.
Organizing with Tags and Correspondents
Tags
Create tags for document categories:
tax— anything tax-relatedmedical— health recordsreceipt— purchase receiptswarranty— product warrantiesinsurance— policies and claimsvehicle— car-related documents
Correspondents
Paperless tracks who sent documents. After a few manual assignments, it learns patterns:
- Amazon → receipts
- IRS → tax documents
- Your doctor’s name → medical
Document Types
Categorize by type:
- Invoice
- Receipt
- Letter
- Contract
- Statement
Automatic Matching Rules
The real power is in matching rules. Go to Settings → Matching and create rules:
- Any document containing “Invoice” → tag:
invoice - Any document from “Blue Cross” → tag:
medical, correspondent: Blue Cross - Any document containing “W-2” → tag:
tax
After setting up 10-15 rules, most documents tag themselves correctly.
Searching Your Documents
The search is incredibly powerful:
- Full text: search for any word in any document
- Filters: combine tags, correspondents, date ranges
- ASN (Archive Serial Number): physical filing reference
Type “water bill 2025” and instantly find every water bill from last year. This alone is worth the setup.
Backup Strategy
Your documents are precious. Back them up:
# Export all documents with metadata
docker compose exec webserver document_exporter ../export
# The export folder now contains everything needed to rebuild
Add this to a cron job:
0 2 * * * cd ~/paperless-ngx && docker compose exec -T webserver document_exporter ../export
Then back up the ./export directory with your normal backup tool (restic, rclone, etc.).
Performance Tuning
OCR Speed
OCR is the bottleneck. Speed it up:
environment:
PAPERLESS_TASK_WORKERS: 2 # Parallel processing (default: 1)
PAPERLESS_OCR_PAGES: 0 # 0 = all pages (limit for speed)
PAPERLESS_OCR_SKIP_ARCHIVE_FILE: with_text # Skip if already has text
Storage
Documents add up. Monitor usage:
docker compose exec webserver python3 manage.py document_stats
Plan for roughly 5MB per page for scanned documents, much less for native PDFs.
Putting It Behind a Reverse Proxy
For remote access, put Paperless behind your reverse proxy. Nginx example:
server {
server_name paperless.yourdomain.com;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 100M; # Allow large document uploads
}
}
The client_max_body_size is important — without it, large scans will fail to upload.
The Payoff
Once you’ve been running Paperless for a month, the benefits compound:
- Tax season: search “W-2 2025” and it’s right there
- Warranty claim: search the product name, find the receipt instantly
- Insurance: every policy, claim, and EOB searchable in seconds
- Moving/legal: all your important documents in one searchable place
The initial scanning effort is the hardest part. Set aside a weekend to scan your paper backlog, then maintain it by scanning new documents as they arrive. Within a month, you’ll wonder how you ever lived without it.
Wrapping Up
Paperless-ngx is one of those self-hosted tools that genuinely improves your daily life. It replaces filing cabinets, makes tax prep painless, and ensures you never lose an important document again.
The Docker setup takes five minutes. The scanning habit takes a weekend to build. The time saved lasts forever.