How to use Open WebUI (backed by Ollama) with Charm's crush


Updated 8/28/25 to include Open WebUI settings to allow the model to have context when using crush.


This post will only cover the crush configuration. You are required to configure Open WebUI and Ollama - these are extremely well documented applications, I can't write anything better than what is already available.

crush is well documented too however it took me a little while to get my configuration working with Ollama (technically Ollama via Open WebUI).

Open WebUI provides an OpenAI compatible endpoint that can be used in the crush.json configuration to call whatever models you have running in Open WebUI. I imagine you could use other model providers as well, however I'm only running Ollama as the model provider.

Here's my config, this will give you a great starting point.

{
  "$schema": "https://charm.land/crush.json",
  "providers": {
    "openwebui": {
      "type": "openai",
      "base_url": "https://your-openwebui-ip/api",
      "api_key": "sk-youropenwebuiapikey_generatedinyourusersettings",
      "name": "Open WebUI",
      "models": [
        {
          "id": "Qwen3:32b",
          "name": "Qwen3 32B",
          "context_window": 256000,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "nomic-embed-text:latest",
          "name": "Nomic Embed Text",
          "context_window": 8192,
          "default_max_tokens": 512,
          "supports_tools": false
        },
        {
          "id": "deepseek-r1:32b",
          "name": "DeepSeek R1 32B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "llama3.1:8b",
          "name": "Llama 3.1 8B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "bjoernb/qwen3-coder-30b:latest",
          "name": "BjoernB Qwen3 Coder 30B",
          "context_window": 32768,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "Qwen3:latest",
          "name": "Qwen3 Latest",
          "context_window": 256000,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "mistral-nemo:latest",
          "name": "Mistral Nemo",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "gpt-oss:20b",
          "name": "GPT-OSS 20B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        }
      ]
    }
  },
  "mcp": {
    "nixos": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-nixos"]
    },
    "brew": {
      "type": "stdio",
      "command": "brew",
      "args": ["mcp-server"]
    },
    "kagi": {
      "type": "stdio",
      "command": "uvx",
      "args": ["kagimcp"],
      "env": {
        "KAGI_API_KEY": "my_kagi_api_key"
      }
    },
    "gitea": {
      "type": "stdio",
      "command": "gitea-mcp",
      "args": [
        "-t",
        "stdio",
        "--host",
        "https://my.gitea.address.com"
      ],
      "env": {
        "GITEA_ACCESS_TOKEN": "mygiteaaccesstoken_createdingiteasettings"
      }
    }
  }
}

You'll probably use different models and MCP servers than me, so add/remove/adjust as needed.

I'm still having issues with crush using filesystem tools, but I haven't figured that out yet. Please reach out if you have.

Remember, LLM ≠ AI.


💾
Open WebUI Sample Settings

Here my sample settings for having context carried over from the last response (and then some).

Everyone has different hardware capabilities. I'm running a GPU with 24GB VRAM, 16 core processor, and 192GB of RAM.

Navigate to Admin Panel > Settings > Models > $your_model > Toggle Advanced Params

Parameter Value
Stream Chat Response Default
Stream Delta Chunk Size Default
Function Calling Native
Reasoning Tags Default
Seed Default
Stop Sequence Default
Temperature Custom - 0.7
Reasoning Effort Default
logit_bias Default
max_tokens Default
top_k Custom - 40
top_p Custom - 0.95
min_p Default
frequency_penalty Default
presence_penalty Default
mirostat Default
mirostat_eta Default
mirostat_tau Default
repeat_last_n Default
tfs_z Default
repeat_penalty Custom - 1.1
use_mmap Custom - Enabled
use_mlock Custom - Enabled
think (Ollama) Default
format (Ollama) Default
num_keep (Ollama) Custom - 1024
num_ctx (Ollama) Custom - 32768
num_batch (Ollama) Custom - 4096
num_thread (Ollama) Custom - 8
num_gpu (Ollama) Default
keep_alive (Ollama) Custom - 1h