Updated 8/28/25 to include Open WebUI settings to allow the model to have context when using crush.
This post will only cover the crush configuration. You are required to configure Open WebUI and Ollama - these are extremely well documented applications, I can't write anything better than what is already available.crush is well documented too however it took me a little while to get my configuration working with Ollama (technically Ollama via Open WebUI).
Open WebUI provides an OpenAI compatible endpoint that can be used in the crush.json configuration to call whatever models you have running in Open WebUI. I imagine you could use other model providers as well, however I'm only running Ollama as the model provider.
Here's my config, this will give you a great starting point.
{
"$schema": "https://charm.land/crush.json",
"providers": {
"openwebui": {
"type": "openai",
"base_url": "https://your-openwebui-ip/api",
"api_key": "sk-youropenwebuiapikey_generatedinyourusersettings",
"name": "Open WebUI",
"models": [
{
"id": "Qwen3:32b",
"name": "Qwen3 32B",
"context_window": 256000,
"default_max_tokens": 20000,
"supports_tools": true
},
{
"id": "nomic-embed-text:latest",
"name": "Nomic Embed Text",
"context_window": 8192,
"default_max_tokens": 512,
"supports_tools": false
},
{
"id": "deepseek-r1:32b",
"name": "DeepSeek R1 32B",
"context_window": 32768,
"default_max_tokens": 8192,
"supports_tools": true
},
{
"id": "llama3.1:8b",
"name": "Llama 3.1 8B",
"context_window": 32768,
"default_max_tokens": 8192,
"supports_tools": true
},
{
"id": "bjoernb/qwen3-coder-30b:latest",
"name": "BjoernB Qwen3 Coder 30B",
"context_window": 32768,
"default_max_tokens": 20000,
"supports_tools": true
},
{
"id": "Qwen3:latest",
"name": "Qwen3 Latest",
"context_window": 256000,
"default_max_tokens": 20000,
"supports_tools": true
},
{
"id": "mistral-nemo:latest",
"name": "Mistral Nemo",
"context_window": 32768,
"default_max_tokens": 8192,
"supports_tools": true
},
{
"id": "gpt-oss:20b",
"name": "GPT-OSS 20B",
"context_window": 32768,
"default_max_tokens": 8192,
"supports_tools": true
}
]
}
},
"mcp": {
"nixos": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-nixos"]
},
"brew": {
"type": "stdio",
"command": "brew",
"args": ["mcp-server"]
},
"kagi": {
"type": "stdio",
"command": "uvx",
"args": ["kagimcp"],
"env": {
"KAGI_API_KEY": "my_kagi_api_key"
}
},
"gitea": {
"type": "stdio",
"command": "gitea-mcp",
"args": [
"-t",
"stdio",
"--host",
"https://my.gitea.address.com"
],
"env": {
"GITEA_ACCESS_TOKEN": "mygiteaaccesstoken_createdingiteasettings"
}
}
}
}
You'll probably use different models and MCP servers than me, so add/remove/adjust as needed.
I'm still having issues with crush using filesystem tools, but I haven't figured that out yet. Please reach out if you have.
Remember, LLM ≠ AI.
Here my sample settings for having context carried over from the last response (and then some).
Everyone has different hardware capabilities. I'm running a GPU with 24GB VRAM, 16 core processor, and 192GB of RAM.
Navigate to Admin Panel > Settings > Models > $your_model > Toggle Advanced Params
| Parameter | Value |
|---|---|
| Stream Chat Response | Default |
| Stream Delta Chunk Size | Default |
| Function Calling | Native |
| Reasoning Tags | Default |
| Seed | Default |
| Stop Sequence | Default |
| Temperature | Custom - 0.7 |
| Reasoning Effort | Default |
| logit_bias | Default |
| max_tokens | Default |
| top_k | Custom - 40 |
| top_p | Custom - 0.95 |
| min_p | Default |
| frequency_penalty | Default |
| presence_penalty | Default |
| mirostat | Default |
| mirostat_eta | Default |
| mirostat_tau | Default |
| repeat_last_n | Default |
| tfs_z | Default |
| repeat_penalty | Custom - 1.1 |
| use_mmap | Custom - Enabled |
| use_mlock | Custom - Enabled |
| think (Ollama) | Default |
| format (Ollama) | Default |
| num_keep (Ollama) | Custom - 1024 |
| num_ctx (Ollama) | Custom - 32768 |
| num_batch (Ollama) | Custom - 4096 |
| num_thread (Ollama) | Custom - 8 |
| num_gpu (Ollama) | Default |
| keep_alive (Ollama) | Custom - 1h |