Motivation
I am experimenting with RAG using LangChain and was thinking about what to use for data for checking and decided to use wikipedia dump data. Since the volume of the whole is large, I decided to use data from the astronomy-related categories that I am interested in.
Here, I summarized a series of steps to extract only specific categories of data from the wikipedia dump data.
[Read More]