Run Japanese LLMs on an on-premise environment -

Motivation

Looking back on the year 2023, it was a year in which many Japanese LLMs (Large Language Models) were released. I have tried to run some of them in my home environment, so I will summarize them here.

Sources

The following is the page where I referred to the code when trying out my Japanese LLM. The last number indicates the number of parameters, 7b has 7 billion parameters and 13b has 13 billion parameters.

1.Calm2 7b Japanese LLM of the server agent 2.ELYZA-japanese-Llama-2-7b Japanese LLM developed by ELYZA based on Llama-2-7b-chat-hf. 3.ELYZA-japanese-Llama-2-13b Japanese LLM developed by ELYZA based on Llama2. 4.Japanese StableLM Instruct Alpha 7B v2 Stability AI’s Japanese LLM 5.Japanese Stable LM Beta 7B Stability AI’s Japanese LLM 6.Youri-7B Japanese LLM developed by Rinna based on Llama2

Operating Environment

Server Environment

Server Name	Jupiter	Ganymede
CPU	Xeon E5-1620 3.60GHz	Xeopn E5-2620 2.00GHz
memory	64GB	64GB
GPU	NVIDIA TITAN V	RTX A4000
OS	Ubuntu 22.04.3 LTS	Ubuntu 22.04.3 LTS

JupyterLab

それぞれの日本語LLMは、次のDockerfileで作製したJupyterLab上で動作させた。

# Docker image with JupyterLab available.
# Installed the necessary packages for the NoteBooks I have created so far.

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

# Set bash as the default shell
ENV SHELL=/bin/bash

# Build with some basic utilities
RUN apt-get update && apt-get install -y \
    python3-pip apt-utils vim \
    git git-lfs \
    curl unzip wget

# alias python='python3'
RUN ln -s /usr/bin/python3 /usr/bin/python

RUN pip install --upgrade pip setuptools \
        && pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
        --index-url https://download.pytorch.org/whl/cu118 \
        && pip install jupyterlab matplotlib pandas scikit-learn ipywidgets \
        && pip install transformers accelerate sentencepiece einops \
        && pip install langchain bitsandbytes auto-gptq protobuf

# Create a working directory
WORKDIR /workdir

# Port number in container side
EXPOSE 8888

ENTRYPOINT ["jupyter-lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root", "--NotebookApp.token=''"]

CMD ["--notebook-dir=/workdir"]

Code

In a JupyterLab cell started as a docker container, I copied and pasted the code from the page shown in the information source and executed it.

Execution Result

	Jupiter	Ganymede
Calm2 7b	○	○
ELYZA-japanese-Llama-2-7b	×	○
ELYZA-japanese-Llama-2-13b	−	△
Japanese StableLM Instruct Alpha 7B v2	×	○
Japanese Stable LM Beta 7B	△	○
Youri-7B	○	○

The legend “○/△/×/−” has the following meanings

○: Returns the answer to a question. (In some cases, the output is partially cut off.)

△: Displays the following message and offloads it to the CPU side for execution. 　　“WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.”

×: CUDA out of memory

−：Not running

The future

It is a bit surprising that 13 billion parameters worked on a 16GB RTX A4000, even offloading. LLMs are a bit of a burden on the 12GB TITAN X. There is a direction to investigate quantization, but I would like to use them in DeepSpeed to investigate model parallelization.

I am going to address DeepSpeed as an issue for 2024.