llama-cpp-python - impact of numpy version upgrade

Introduction. NumPy 2.0.0 was released on June 16. I first noticed it the other day when I tried RAG with using langchain and got an error when building the docker container. Later, I encountered another error in CMake when trying to incorporate llama-cpp-python. This article summarizes my responses to the two errors I recently experienced. Dealing with errors related to NumPy 2.0.0 Background I recently decided to learn RAG properly, I purchased a japanese book called [LLM fine tuning and RAG](https://www. [Read More]

Try RAG with LlamaIndex

Motivation

In this post where I tested Chatbot UI, I mentioned that one of my future challenges is to work with RAG (Retrieval Augmented Generation). In this post, I summarized how to achieve RAG using LlamaIndex.

Actually, I tried RAG using Langchain late last year. Since then, I have heard a lot of keywords with LlamaIndex, so I decided to realize RAG using LlamaIndex this time.

[Read More]

Try the Chatbot UI

Introduction

In a recent post, I ran the ELYZA 7B model in a local environment using llama-cpp-python. In that post, I mentioned that “about the future” I would like to try to build a system that can chat like ChatGPT.

This time, I built a system that can chat like ChatGPT on a docker container, and I summarize its contents here.

[Read More]

Running Elyza models on GPU using llama-cpp-python

Motivation

Quantization is essential to run LLM on the local workstation (12-16 GB of GPU memory). In this post, I summarize my attempt to maximize GPU resources using llama-cpp-python.

The content includes some of my mistakes, as I got into some areas due to my lack of understanding.

[Read More]

Measuring OpenMPI performance again using the HIMENO benchmark

Introduction

I have changed the hostfile that determines the order of OpenMPI execution nodes and re-measured OpenMPI performance on the Himeno benchmark as this article I posted it. After posting, I thought about it again and decided to use objective figures instead of my own judgments based on CPU and clock performance.

So this time, I decided to measure the performance of each individual workstation (node), and then decide the order of hostfile according to the results, and measure them again.

[Read More]

Re-measure OpenMPI performance using the HIMENO benchmark

Introduction

A month ago in this post, I measured the performance of OpenMPI with the HIMENO benchmark. My friend who saw that post pointed out some improvements regarding the order of the hostfile. In this post, I summarized the results of the performance measurement again after modifying the hostfile.

[Read More]

easuring OpenMPI performance using the HIMENO benchmark

Motivation

As I stated in this post yesterday, I was able to run a program using OpenMPI on a Docker container running on multiple nodes. I wanted to find out how much performance I could improve by using OpenMPI, so I decided to benchmark it. Actually, I had some difficulties this time as well, and I would be happy if that part is helpful for others.

[Read More]