Motivation
I have been studying PINNs and related OpenFOAM for a while, but yesterday there was a big news in LLM area and I decided to use Deep Seek-R1 which had an impact not only on LLM area but also on stock prices. Since I could not use it as it is in my environment, I used a compacted LLM with quantization.
This time, I used Ollama and Open WebUI to use the quantized model from a browser, and I will summarize the contents.
Sources
- Impact of DeepSeek-R1, the Chinese AI that surpasses ChatGPT - from ascii.jp. Cited as an example in an article on DeepSeek-R1.
- Announcement from CyberAgent - An announcement from CyberAgent that they have released an LLM based on DeepSeek-R1-Distill-Qwen-14B/32B with additional training on Japanese data.
- mmnga/cyberagent-DeepSeek-R1-Distill-Qwen-14B-japanese-gguf - Quantized model of the above CyberAgent version, published on HugginFace.
- Conversing with ollama’s LLM using Open WebUI as a front end - I put this together two months ago, but I forgot all about it and had a hard time remembering it. Open WebUI provides a good connection between the Ollama server and the web browser.
- Using ollama to run LLM in a local environment - This is an article from the day before the above post, about running Ollama with Docker.
Procedure
Download the model from
The 4-bit “Q 4_K_M”, a quantization model published in source 3, was downloaded using the following command.
$ wget https://huggingface.co/mmnga/cyberagent-DeepSeek-R1-Distill-Qwen-14B-Japanese-gguf/resolve/main/cyberagent-DeepSeek-R1-Distill-Qwen-14B-Japanese-Q4_K_M.gguf
Embedding into the Ollama server
First, start ollama as indicated in “Creating docker-compose.yml” in Resource 5.
$ sudo docker compose up -d
Next, enter the OLLAMA container started above, as described in “Downloading and Executing (the) LLM Model”.
$ sudo docker exec -it ollama /bin/bash
The following is an operation within a container.
# cd /root/.ollama
# ls -l
total 8777476
-rw-rw-r-- 1 1000 1000 68 Jan 28 05:20 Modelfile
-rw-rw-r-- 1 1000 1000 8988110464 Jan 28 05:19 cyberagent-DeepSeek-R1-Distill-Qwen-14B-Japanese-Q4_K_M.gguf
# cat Modelfile
FROM ./cyberagent-DeepSeek-R1-Distill-Qwen-14B-Japanese-Q4_K_M.gguf
Prepare the downloaded model (.ggu file) and Modelfile as described above so that they can be referenced from the container. When you are ready to this point, use the following command to incorporate them into the ollama server.
# ollama create cyberagent-DS-14b-japanese -f Modelfile
For the incorporation part, see also “Incorporating a gguf file as a model” in Source 5.
Launch Open WebUI
$ ls -l
drwxrwxr-x 5 kenji kenji 4096 11月 23 11:19 data
-rw-rw-r-- 1 kenji kenji 243 11月 26 08:33 docker-compose.yml
$ cat docker-compose.yml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
environment:
- OLLAMA_BASE_URL=http://192.168.11.4:11434
volumes:
- ./data:/app/backend/data
ports:
- 3000:8080
$ sudo docker compose up -d
For the part about starting Open WebUI, please also refer to “Starting Open WebUI” in Information Source 4.
Connecting from a Web browser
Since the Ollama server and Open WebUI have already been started up, enter “http://192.168.11.8:3000/” from the browser at hand, and the Open WebUI startup screen will appear. After entering your name, e-mail address, and password, you will be prompted with a screen that accepts questions.
Once connected, change the LLM model in the upper left portion of the screen to ask a question. The following is an excerpt of the answer section.
Summary
I was surprised by the amount of answers to my questions, but when I threw other questions, the answers were relatively compact.
I tried the Japanese version of the popular DeepSeek-R1 Cyber Agent quantized model, using the Ollama and Open WebUI systems I had built up to 2 months ago, but I had forgotten many things, such as how to incorporate them into the model and the startup procedure, so it helped that I had summarized them in the past. It helped me a lot.