Build RAG system

Introduction

By yesterday, I had extracted astronomy-related entries from Wikipedia and created a vector database and keyword base for RAG. Here, I will use those databases to build the RAG system.

The LLMs used are ChatGPT (gpt-4o) and Llama-3-ELYZA-JP-8B.

[Read More]

Creating text data for RAG from Wikipedia dump data

Motivation

I am experimenting with RAG using LangChain and was thinking about what to use for data for checking and decided to use wikipedia dump data. Since the volume of the whole is large, I decided to use data from the astronomy-related categories that I am interested in.

Here, I summarized a series of steps to extract only specific categories of data from the wikipedia dump data.

[Read More]

llama-cpp-python - impact of numpy version upgrade

Introduction. NumPy 2.0.0 was released on June 16. I first noticed it the other day when I tried RAG with using langchain and got an error when building the docker container. Later, I encountered another error in CMake when trying to incorporate llama-cpp-python. This article summarizes my responses to the two errors I recently experienced. Dealing with errors related to NumPy 2.0.0 Background I recently decided to learn RAG properly, I purchased a japanese book called [LLM fine tuning and RAG](https://www. [Read More]

Try RAG with LlamaIndex

Motivation

In this post where I tested Chatbot UI, I mentioned that one of my future challenges is to work with RAG (Retrieval Augmented Generation). In this post, I summarized how to achieve RAG using LlamaIndex.

Actually, I tried RAG using Langchain late last year. Since then, I have heard a lot of keywords with LlamaIndex, so I decided to realize RAG using LlamaIndex this time.

[Read More]

Try the Chatbot UI

Introduction

In a recent post, I ran the ELYZA 7B model in a local environment using llama-cpp-python. In that post, I mentioned that “about the future” I would like to try to build a system that can chat like ChatGPT.

This time, I built a system that can chat like ChatGPT on a docker container, and I summarize its contents here.

[Read More]

Running Elyza models on GPU using llama-cpp-python

Motivation

Quantization is essential to run LLM on the local workstation (12-16 GB of GPU memory). In this post, I summarize my attempt to maximize GPU resources using llama-cpp-python.

The content includes some of my mistakes, as I got into some areas due to my lack of understanding.

[Read More]

Measuring OpenMPI performance again using the HIMENO benchmark

Introduction

I have changed the hostfile that determines the order of OpenMPI execution nodes and re-measured OpenMPI performance on the Himeno benchmark as this article I posted it. After posting, I thought about it again and decided to use objective figures instead of my own judgments based on CPU and clock performance.

So this time, I decided to measure the performance of each individual workstation (node), and then decide the order of hostfile according to the results, and measure them again.

[Read More]

Re-measure OpenMPI performance using the HIMENO benchmark

Introduction

A month ago in this post, I measured the performance of OpenMPI with the HIMENO benchmark. My friend who saw that post pointed out some improvements regarding the order of the hostfile. In this post, I summarized the results of the performance measurement again after modifying the hostfile.

[Read More]