llama-cpp-python - impact of numpy version upgrade

Introduction.

NumPy 2.0.0 was released on June 16. I first noticed it the other day when I tried RAG with using langchain and got an error when building the docker container. Later, I encountered another error in CMake when trying to incorporate llama-cpp-python.

This article summarizes my responses to the two errors I recently experienced.

Background

I recently decided to learn RAG properly, I purchased a japanese book called [LLM fine tuning and RAG](https://www.amazon.co.jp/LLM%E3%81%AE%E3%83%95%E3%82%A1%E3%82%A4%E3%83%B3%E3%83%81%E3%83% A5%E3%83%BC%E3%83%8B%E3%83%B3%E3%82%B0%E3%81%A8RAG-%E3%83%81%E3%83%A3%E3%83%83%88%E3%83%9C%E3%83%83%E3%83%88%E9%96%8B%E7%99%% BA BA%E3%81%AB%E3%82%88%E3%82%8B%E5%AE%9F%E8%B7%B5-%E6%96%B0%E7%B4%8D-%E6%B5%A9%E5%B9%B8/dp/427423195X/ref=sr_1_1?adgrpid= 163866852618&dib=eyJ2IjoiMSJ9.l9B2pkGMCW8NM84ILQFz9iInNon9RD7aYjdD4lezJU0SfjDOegczb_kM6ig21g0uQWC49xKxoiKz2Ba7GS9 hSFgaoIpZHBLPqVwnJzIGWP8.C1ifX-UWbiCvSZ8QSV4IksCKNDbVHK1VfcYIKNFiAbo&dib_tag=se&gad_source=1&hvadid=700557495255&hvdev=c&hvlocphy= 9195748&hvnetw=g&hvqmt=e&hvrand=16221549810257939547&hvtargid=kwd-2500108676872&hydadcr=27271_14738618&jp-ad-ap=0&keywords=llm%E3%E3 81%AE%E3%83%95%E3%82%A1%E3%82%A4%E3%83%B3%E3%83%81%E3%83%A5%E3%83%BC%E3%83%8B%E3%83%B3%E3%82%B0%E3%81%A8rag&qid=1720083385&sr=8-1). The book uses langchain, so I decided to create a docker container for jupyterlab that incorporates the langchain library.

Phenomenon

I tried to run the book code on jupyterlab with “pip install langchain-community faiss-gpu” added to the Dockerfile. Then the following error message appeared.

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

I thought a simple solution would be to install the previous version 1.26.4 of numpy as per the above message. However, in my Dockerfile, I did not explicitly “pip install numpy”. I thought about where to “RUN pip install -U numpy==1.26.4” and decided to do it at the final stage of the installation procedure.

Dockerfile

FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

# Set bash as the default shell
ENV SHELL=/bin/bash

# Build with some basic utilities
RUN apt update \
        && apt install -y \
        wget \
        bzip2 \
        git \
        git-lfs \
        curl \
        unzip \
        file \
        xz-utils \
        sudo \
        python3 \
        python3-pip && \
        apt-get autoremove -y && \
        apt-get clean && \
        rm -rf /usr/local/src/*

# alias python='python3'
RUN ln -s /usr/bin/python3 /usr/bin/python

RUN pip install --upgrade pip setuptools \
        && pip install torch torchvision torchaudio \
        && pip install jupyterlab matplotlib pandas scikit-learn ipywidgets \
        && pip install transformers accelerate sentencepiece einops \
        && pip install langchain bitsandbytes protobuf \
        && pip install auto-gptq optimum \
        && pip install pypdf tiktoken sentence_transformers faiss-gpu trafilatura \
        && pip install langchain-community

# Install llama-cpp-python[server] with cuBLAS on
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 \
        pip install llama-cpp-python[server] --force-reinstall --no-cache-dir

RUN pip install -U numpy==1.26.4

# Create a working directory
WORKDIR /workdir

# Port number in container side
EXPOSE 8888

ENTRYPOINT ["jupyter-lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root", "--NotebookApp.token=''"]

CMD ["--notebook-dir=/workdir"]

I later found out that numpy 2.0.0 had been installed, while llama-cpp-python had been CMaked.

This seemed to solve the problem, but later (4 days later to be exact), when I did a docker build on another machine, I hit another error this time.

CMake Error at vendor/llama.cpp/CMakeLists.txt:95

Phenomenon

As the title says, during docker build (with “RUN CMAKE _ARGS=…” in the above Dockerfile), I got the following error.

・・・
19.76       CMake Error at vendor/llama.cpp/CMakeLists.txt:95 (message):
19.76         LLAMA_CUBLAS is deprecated and will be removed in the future.
19.76
19.76         Use GGML_CUDA instead
・・・
19.76   note: This error originates from a subprocess, and is likely not a problem with pip.
19.76   ERROR: Failed building wheel for llama-cpp-python
19.76 Failed to build llama-cpp-python
19.76 ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based \projects (llama-cpp-python)

Solution #1

I ran the container that was running a few days ago and checked the version of llama_cpp_python.

# pip list | grep llama
llama_cpp_python          0.2.75

Therefore, the above Dockerfile can now be built by modifying it as follows.

# Install llama-cpp-python[server] with cuBLAS on
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 \
        pip install llama-cpp-python[server]==0.2.75 --force-reinstall --no-cache-dir

Solution #2

Once again, the error message says that LLAMA_CUBLAS is deprecated and will be removed in the future.

Use GGML_CUDA instead.

So, I modified the Dockerfile as follows. Other parts are unchanged.

RUN CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 \
        pip install llama-cpp-python[server] --force-reinstall --no-cache-dir

I decided to adopt the above as I believe this is the more essential response.

Conclusion

The two errors described above do not seem to be related to each other.

It took time to make environment due to the need to use the library to build container at the timing of those changes. But I learned a lot from it.

I will post a summary of RAG soon.