Self-Hosting LanguageTool: Grammar Checker API with Docker

Every writing tool wants to phone home. Grammarly reads everything you type. Google Docs analyzes your documents on their servers. Even browser extensions quietly ship your text to cloud APIs for grammar checking. If you write anything sensitive — legal documents, medical notes, proprietary code comments, personal journals — that’s a problem.

LanguageTool is an open-source grammar, style, and spell checker that supports over 30 languages. It powers grammar checking in LibreOffice, and its commercial cloud service competes directly with Grammarly. But unlike Grammarly, you can run your own instance. Your text never leaves your network, you get unlimited checks with no word caps, and you can plug it into browser extensions, text editors, and custom applications via a clean REST API.

The self-hosted version doesn’t include LanguageTool’s newer AI-based rules (those are cloud-only), but the rule-based engine catches the vast majority of grammar, spelling, punctuation, and style issues. Add n-gram datasets and you get context-sensitive spell checking that catches commonly confused words like “their” vs “there” — something basic spell checkers miss entirely.

LanguageTool vs Other Writing Tools

FeatureLanguageTool (Self-Hosted)GrammarlyProWritingAidValeHunspell
Open source✅ LGPL 2.1❌ Proprietary❌ Proprietary✅ MIT✅ Various
Self-hostable✅ Docker/Java✅ CLI only✅ CLI only
Languages✅ 30+⚠️ ~12⚠️ English only⚠️ English-focused✅ Many
Grammar checking✅ Rule-based✅ AI + rules✅ AI + rules⚠️ Style only❌ Spell only
Style suggestions✅ Built-in✅ Premium✅ Configurable
Context-aware spelling✅ With n-grams
REST API⚠️ Paid
Browser extension✅ Custom server
Privacy✅ 100% local❌ Cloud-only❌ Cloud-only✅ Local✅ Local
PricingFree (self-hosted)From $12/moFrom $10/moFreeFree

LanguageTool hits the sweet spot: real grammar checking (not just spell check) with full privacy and a proper API. If you write in multiple languages, it’s basically the only self-hosted option that handles them all.

Prerequisites

  • Docker and Docker Compose installed (Get Docker)
  • At least 2 GB of RAM (4+ GB recommended with n-gram datasets)
  • Optional: a domain name for remote access (e.g., grammar.example.com)
  • Optional: 8-10 GB of disk space per language for n-gram datasets

Quick Start with Docker Compose

Create a project directory and configuration:

mkdir languagetool && cd languagetool

Create docker-compose.yml:

services:
  languagetool:
    image: erikvl87/languagetool:latest
    container_name: languagetool
    ports:
      - "8010:8010"
    environment:
      - Java_Xms=512m
      - Java_Xmx=2g
      - langtool_pipelinePrewarming=true
      - langtool_maxTextLength=50000
    volumes:
      - languagetool_data:/LanguageTool/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8010/v2/languages"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

volumes:
  languagetool_data:

Start the server:

docker compose up -d

The first startup takes 30-60 seconds as LanguageTool loads its rule database and optionally prewarms the processing pipeline. Once ready, test it:

curl -d "language=en-US" -d "text=Their going to the store yesterday." \
  http://localhost:8010/v2/check | python3 -m json.tool

You should see matches flagging “Their” (should be “They’re”) and possibly “going” with “yesterday” (tense inconsistency). That confirms your grammar checker is live.

Understanding the Configuration

The key environment variables control how LanguageTool behaves:

VariableDefaultDescription
Java_Xms256mMinimum Java heap size
Java_Xmx512mMaximum Java heap size — increase for production use
langtool_pipelinePrewarmingfalsePrewarm language pipelines at startup for faster first checks
langtool_maxTextLength40000Maximum characters per request (increase for long documents)
langtool_maxCheckThreads10Concurrent check threads
langtool_cacheSize0Number of cached results (set to 1000+ for repeated checks)
langtool_requestLimit0Max requests per requestLimitPeriodInSeconds (0 = unlimited)
langtool_languageModelPath to n-gram data directory inside container

For a production setup serving a small team, Java_Xmx=2g with pipeline prewarming handles most workloads comfortably.

Adding N-Gram Datasets for Smarter Checking

The base LanguageTool install catches grammar and spelling errors using rules. N-gram datasets add statistical analysis — LanguageTool compares word sequences against billions of real-world text samples to catch errors that rules miss.

The difference is significant. Without n-grams, “I went to there house” might only flag as a grammar suggestion. With n-grams, LanguageTool confidently identifies “there” → “their” because “their house” appears orders of magnitude more frequently in real text than “there house.”

Download n-gram data (English shown — repeat for other languages):

mkdir -p ./ngrams
cd ./ngrams

# English (~8 GB unzipped)
wget https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
unzip ngrams-en-20150817.zip

# Optional: German (~8 GB), French (~3 GB), Spanish (~3 GB)
# wget https://languagetool.org/download/ngram-data/ngrams-de-20150819.zip
# wget https://languagetool.org/download/ngram-data/ngrams-fr-20150913.zip
# wget https://languagetool.org/download/ngram-data/ngrams-es-20150915.zip

Update your docker-compose.yml to mount the n-gram data:

services:
  languagetool:
    image: erikvl87/languagetool:latest
    container_name: languagetool
    ports:
      - "8010:8010"
    environment:
      - Java_Xms=512m
      - Java_Xmx=2g
      - langtool_pipelinePrewarming=true
      - langtool_maxTextLength=50000
      - langtool_languageModel=/ngrams
    volumes:
      - languagetool_data:/LanguageTool/data
      - ./ngrams:/ngrams:ro
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8010/v2/languages"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

volumes:
  languagetool_data:

Restart the container:

docker compose down && docker compose up -d

Startup will be slower with n-gram data loaded (up to 2-3 minutes). Once ready, test the improved checking:

curl -d "language=en-US" \
  -d "text=I went to there house and than we went too the store." \
  http://localhost:8010/v2/check | python3 -m json.tool

With n-grams, you should see “there” → “their”, “than” → “then”, and “too” → “to” all flagged — confused word pairs that basic spell checkers miss completely.

Connecting Browser Extensions

The LanguageTool browser extension for Chrome and Firefox supports custom servers. This gives you grammar checking across every text field on the web — Gmail, Google Docs, social media, CMS editors — all hitting your private instance.

  1. Install the LanguageTool extension for your browser
  2. Click the extension icon → gear icon → Settings
  3. Scroll to Advanced or Experimental settings
  4. Select Local server (localhost) or Other server
  5. Enter your server URL: http://localhost:8010/v2
  6. For remote access: https://grammar.example.com/v2

If you’re accessing from other machines, you’ll need a reverse proxy with HTTPS — browsers increasingly block mixed HTTP content from extensions.

Reverse Proxy Setup

Add to your Caddyfile:

g}rammraerv.eerxsaem_pplreo.xcyomla{nguagetool:8010

Caddy handles HTTPS automatically. If running Caddy in Docker, ensure it’s on the same network as LanguageTool.

Nginx

server {
    listen 443 ssl http2;
    server_name grammar.example.com;

    ssl_certificate /etc/letsencrypt/live/grammar.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/grammar.example.com/privkey.pem;

    location / {
        proxy_pass http://languagetool:8010;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # LanguageTool can return large responses for long documents
        proxy_read_timeout 120s;
        proxy_buffer_size 16k;
        proxy_buffers 4 32k;
    }
}

Integrating with Text Editors

VS Code

Install the LTeX extension and add to your settings.json:

{
    "ltex.ltex-ls.languageToolHttpServerUri": "http://localhost:8010",
    "ltex.language": "en-US",
    "ltex.enabled": ["markdown", "latex", "plaintext", "html"]
}

This gives you real-time grammar checking in Markdown, LaTeX, and plain text files — perfect for documentation and technical writing.

Neovim

With nvim-lspconfig, configure LTeX:

require('lspconfig').ltex.setup{
    settings = {
        ltex = {
            language = "en-US",
            languageToolHttpServerUri = "http://localhost:8010",
        },
    },
}

Obsidian

The Obsidian LanguageTool Plugin supports custom servers. In settings, set the server URL to http://localhost:8010 and enable auto-checking.

API Usage for Custom Applications

LanguageTool’s REST API is straightforward. Here are common patterns:

Basic Check

curl -X POST http://localhost:8010/v2/check \
  -d "language=en-US" \
  -d "text=This are a test of the grammar checker."

Auto-Detect Language

curl -X POST http://localhost:8010/v2/check \
  -d "language=auto" \
  -d "text=Dies ist ein Test."

Check with Specific Rules Disabled

curl -X POST http://localhost:8010/v2/check \
  -d "language=en-US" \
  -d "text=This is a test." \
  -d "disabledRules=UPPERCASE_SENTENCE_START,COMMA_PARENTHESIS_WHITESPACE"

Python Integration

import requests

def check_grammar(text, language="en-US"):
    response = requests.post(
        "http://localhost:8010/v2/check",
        data={"text": text, "language": language}
    )
    result = response.json()
    for match in result.get("matches", []):
        print(f"Issue: {match['message']}")
        print(f"  Context: {match['context']['text']}")
        if match['replacements']:
            print(f"  Suggestion: {match['replacements'][0]['value']}")
        print()

check_grammar("Their going to there house and than leaving.")

List Supported Languages

curl http://localhost:8010/v2/languages | python3 -m json.tool

Adding Custom Words and Rules

You’ll inevitably have words LanguageTool doesn’t recognize — company names, product names, technical jargon. Rather than ignoring the warnings, add them to custom dictionaries.

Create a custom spelling.txt file:

mkdir -p ./config
cat > ./config/spelling.txt << 'EOF'
# Company and product names
Kubernetes
PostgreSQL
Redis
Nginx
selfhostsetup
Cloudflare

# Technical terms
homelab
proxmox
truenas
EOF

Mount the custom dictionary into the container by adding to your volumes:

    volumes:
      - languagetool_data:/LanguageTool/data
      - ./ngrams:/ngrams:ro
      - ./config/spelling.txt:/LanguageTool/org/languagetool/resource/en/hunspell/spelling.txt:ro

For rule customization, you can disable specific rules globally by creating a server.properties file:

cat > ./config/server.properties << 'EOF'
# Disable rules that don't apply to technical writing
disabledRuleIds=WHITESPACE_RULE,UPPERCASE_SENTENCE_START
EOF

Mount it:

    volumes:
      - ./config/server.properties:/LanguageTool/server.properties:ro

Backup and Restore

LanguageTool’s state is minimal — it’s primarily a stateless API server. Your important data is:

  1. docker-compose.yml — your configuration
  2. Custom dictionariesspelling.txt and any rule overrides
  3. N-gram datasets — large but downloadable again

Back up the configuration:

#!/bin/bash
BACKUP_DIR="./backups/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
cp docker-compose.yml "$BACKUP_DIR/"
cp -r config/ "$BACKUP_DIR/" 2>/dev/null
echo "Backup saved to $BACKUP_DIR"

N-gram datasets don’t need backing up — they’re static downloads. Just keep note of which languages you installed.

Troubleshooting

Server takes a long time to start: Pipeline prewarming loads language models into memory at startup. With n-grams, expect 2-3 minutes. Check progress with docker logs -f languagetool. If it hangs beyond 5 minutes, increase Java_Xmx.

Out of memory errors: With n-gram datasets, 2 GB of Java heap is the practical minimum. For multiple languages with n-grams, allocate 4+ GB. Monitor with docker stats languagetool.

Browser extension shows “Cannot connect to server”: Verify the server is running: curl http://localhost:8010/v2/languages. If accessing remotely, ensure HTTPS is configured — browser extensions often reject HTTP connections to non-localhost addresses.

Slow response times: Enable pipeline prewarming (langtool_pipelinePrewarming=true) and increase cache size (langtool_cacheSize=1000). First request after startup is always slower. For large documents, split into smaller chunks.

Language not detected correctly: Explicitly pass the language parameter instead of using auto. Automatic detection struggles with short text snippets. For better auto-detection, add fastText (requires building from source or using a custom Docker image).

Custom words not recognized: Ensure the spelling file is mounted to the correct path for your language. English uses /LanguageTool/org/languagetool/resource/en/hunspell/spelling.txt. Check that the file uses UTF-8 encoding with one word per line.

Power User Tips

  • Rate limiting for shared instances: Set langtool_requestLimit=20 and langtool_requestLimitPeriodInSeconds=60 to prevent abuse on shared servers
  • Multiple languages: Load n-gram data for each language you need — LanguageTool detects the language and uses the appropriate dataset
  • CI/CD integration: Use the API in your build pipeline to check documentation PRs. Fail the build on grammar errors above a threshold
  • Monitoring: Hit /v2/languages as a health endpoint. If it responds, the server is healthy
  • Resource tuning: Start with Java_Xmx=1g without n-grams or Java_Xmx=2g with them. Monitor actual usage with docker stats and adjust
  • Docker networking: Put LanguageTool on an internal Docker network with your reverse proxy. No need to expose port 8010 to the host if you’re only accessing through the proxy
  • Multi-user setup: LanguageTool is inherently multi-tenant — concurrent requests are handled by the thread pool. No user accounts needed for basic use

Wrapping Up

Self-hosting LanguageTool gives you a private grammar checking API that works with browser extensions, text editors, and custom applications. It’s one of those services where self-hosting makes obvious sense — your writing is some of the most personal data you have, and there’s no reason to send it to a third party for basic grammar checking.

The setup is straightforward: a single container, optional n-gram datasets for smarter checking, and a reverse proxy if you want remote access. Add it to your browser extension config and you’ve got Grammarly-like checking everywhere you type, without the subscription or the privacy tradeoff.

Related guides: