Motivation
Looking back on the year 2023, it was a year in which many Japanese LLMs (Large Language Models) were released. I have tried to run some of them in my home environment, so I will summarize them here.
Sources
The following is the page where I referred to the code when trying out my Japanese LLM. The last number indicates the number of parameters, 7b has 7 billion parameters and 13b has 13 billion parameters.
1.Calm2 7b Japanese LLM of the server agent 2.ELYZA-japanese-Llama-2-7b Japanese LLM developed by ELYZA based on Llama-2-7b-chat-hf. 3.ELYZA-japanese-Llama-2-13b Japanese LLM developed by ELYZA based on Llama2. 4.Japanese StableLM Instruct Alpha 7B v2 Stability AI’s Japanese LLM 5.Japanese Stable LM Beta 7B Stability AI’s Japanese LLM 6.Youri-7B Japanese LLM developed by Rinna based on Llama2
Operating Environment
Server Environment
Server Name | Jupiter | Ganymede |
---|---|---|
CPU | Xeon E5-1620 3.60GHz | Xeopn E5-2620 2.00GHz |
memory | 64GB | 64GB |
GPU | NVIDIA TITAN V | RTX A4000 |
OS | Ubuntu 22.04.3 LTS | Ubuntu 22.04.3 LTS |
JupyterLab
それぞれの日本語LLMは、次のDockerfileで作製したJupyterLab上で動作させた。
# Docker image with JupyterLab available.
# Installed the necessary packages for the NoteBooks I have created so far.
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
# Set bash as the default shell
ENV SHELL=/bin/bash
# Build with some basic utilities
RUN apt-get update && apt-get install -y \
python3-pip apt-utils vim \
git git-lfs \
curl unzip wget
# alias python='python3'
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN pip install --upgrade pip setuptools \
&& pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
--index-url https://download.pytorch.org/whl/cu118 \
&& pip install jupyterlab matplotlib pandas scikit-learn ipywidgets \
&& pip install transformers accelerate sentencepiece einops \
&& pip install langchain bitsandbytes auto-gptq protobuf
# Create a working directory
WORKDIR /workdir
# Port number in container side
EXPOSE 8888
ENTRYPOINT ["jupyter-lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root", "--NotebookApp.token=''"]
CMD ["--notebook-dir=/workdir"]
Code
In a JupyterLab cell started as a docker container, I copied and pasted the code from the page shown in the information source and executed it.
Execution Result
Jupiter | Ganymede | |
---|---|---|
Calm2 7b | ○ | ○ |
ELYZA-japanese-Llama-2-7b | × | ○ |
ELYZA-japanese-Llama-2-13b | − | △ |
Japanese StableLM Instruct Alpha 7B v2 | × | ○ |
Japanese Stable LM Beta 7B | △ | ○ |
Youri-7B | ○ | ○ |
The legend “○/△/×/−” has the following meanings
○: Returns the answer to a question. (In some cases, the output is partially cut off.)
△: Displays the following message and offloads it to the CPU side for execution. “WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.”
×: CUDA out of memory
−:Not running
The future
It is a bit surprising that 13 billion parameters worked on a 16GB RTX A4000, even offloading. LLMs are a bit of a burden on the 12GB TITAN X. There is a direction to investigate quantization, but I would like to use them in DeepSpeed to investigate model parallelization.
I am going to address DeepSpeed as an issue for 2024.