How to use Open WebUI (backed by Ollama) with Charm's crush

Updated 8/28/25 to include Open WebUI settings to allow the model to have context when using crush.

This post will only cover the crush configuration. You are required to configure Open WebUI and Ollama - these are extremely well documented applications, I can't write anything better than what is already available.

crush is well documented too however it took me a little while to get my configuration working with Ollama (technically Ollama via Open WebUI).

Open WebUI provides an OpenAI compatible endpoint that can be used in the crush.json configuration to call whatever models you have running in Open WebUI. I imagine you could use other model providers as well, however I'm only running Ollama as the model provider.

Here's my config, this will give you a great starting point.

{
  "$schema": "https://charm.land/crush.json",
  "providers": {
    "openwebui": {
      "type": "openai",
      "base_url": "https://your-openwebui-ip/api",
      "api_key": "sk-youropenwebuiapikey_generatedinyourusersettings",
      "name": "Open WebUI",
      "models": [
        {
          "id": "Qwen3:32b",
          "name": "Qwen3 32B",
          "context_window": 256000,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "nomic-embed-text:latest",
          "name": "Nomic Embed Text",
          "context_window": 8192,
          "default_max_tokens": 512,
          "supports_tools": false
        },
        {
          "id": "deepseek-r1:32b",
          "name": "DeepSeek R1 32B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "llama3.1:8b",
          "name": "Llama 3.1 8B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "bjoernb/qwen3-coder-30b:latest",
          "name": "BjoernB Qwen3 Coder 30B",
          "context_window": 32768,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "Qwen3:latest",
          "name": "Qwen3 Latest",
          "context_window": 256000,
          "default_max_tokens": 20000,
          "supports_tools": true
        },
        {
          "id": "mistral-nemo:latest",
          "name": "Mistral Nemo",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        },
        {
          "id": "gpt-oss:20b",
          "name": "GPT-OSS 20B",
          "context_window": 32768,
          "default_max_tokens": 8192,
          "supports_tools": true
        }
      ]
    }
  },
  "mcp": {
    "nixos": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcp-nixos"]
    },
    "brew": {
      "type": "stdio",
      "command": "brew",
      "args": ["mcp-server"]
    },
    "kagi": {
      "type": "stdio",
      "command": "uvx",
      "args": ["kagimcp"],
      "env": {
        "KAGI_API_KEY": "my_kagi_api_key"
      }
    },
    "gitea": {
      "type": "stdio",
      "command": "gitea-mcp",
      "args": [
        "-t",
        "stdio",
        "--host",
        "https://my.gitea.address.com"
      ],
      "env": {
        "GITEA_ACCESS_TOKEN": "mygiteaaccesstoken_createdingiteasettings"
      }
    }
  }
}

You'll probably use different models and MCP servers than me, so add/remove/adjust as needed.

I'm still having issues with crush using filesystem tools, but I haven't figured that out yet. Please reach out if you have.

Remember, LLM ≠ AI.

💾

Open WebUI Sample Settings

Here my sample settings for having context carried over from the last response (and then some).

Everyone has different hardware capabilities. I'm running a GPU with 24GB VRAM, 16 core processor, and 192GB of RAM.

Navigate to Admin Panel > Settings > Models > $your_model > Toggle Advanced Params

Parameter	Value
Stream Chat Response	Default
Stream Delta Chunk Size	Default
Function Calling	Native
Reasoning Tags	Default
Seed	Default
Stop Sequence	Default
Temperature	Custom - 0.7
Reasoning Effort	Default
logit_bias	Default
max_tokens	Default
top_k	Custom - 40
top_p	Custom - 0.95
min_p	Default
frequency_penalty	Default
presence_penalty	Default
mirostat	Default
mirostat_eta	Default
mirostat_tau	Default
repeat_last_n	Default
tfs_z	Default
repeat_penalty	Custom - 1.1
use_mmap	Custom - Enabled
use_mlock	Custom - Enabled
think (Ollama)	Default
format (Ollama)	Default
num_keep (Ollama)	Custom - 1024
num_ctx (Ollama)	Custom - 32768
num_batch (Ollama)	Custom - 4096
num_thread (Ollama)	Custom - 8
num_gpu (Ollama)	Default
keep_alive (Ollama)	Custom - 1h