Run Docker containers implementing OpenMPI on multiple nodes

Motivation

As previously mentioned in this post, I am moving forward with the goal of running Athena++ on multiple nodes. As a preliminary step, I have attempted to run a Docker container with OpenMPI configured on multiple nodes. I will post a summary of what I have done, as I had some difficulties and it may be helpful to others.

[Read More]

Run the Athena++ tutorial

Introduction

In a previous post, I summarized the contents of the first tutorial “1D Hydorodynamics and MHD” after installing Athena++. This post is a continuation of that post and summarizes the contents of the tutorial that was executed to perform visualization and other tasks.

[Read More]

Try Athena++, a magnetohydrodynamic simulation code for astrophysics

Introduction

I have been interested in trying out astrophysics-related simulations for some time, and had been checking out ENZO, GADGET, GIZMO, and others. I happened to know Athena, and when I looked into it, I found that Associate Professor Kengo Tomita of Tohoku University maintains a Japanese page and has some information in Japanese, so I decided to give it a try.

Here, I summarize the process of installing and running the tutorial, and post it.

[Read More]

Run Japanese LLMs on an on-premise environment

Motivation

Looking back on the year 2023, it was a year in which many Japanese LLMs (Large Language Models) were released. I have tried to run some of them in my home environment, so I will summarize them here.

[Read More]

Modify CNN training code to work with Horovod

Introduction

With Try Horovod in Docker, you can now use Horovod in your own environment (on-premises) and in a Docker environment. The next thing to do is to modify the training code running on a single server to apply it to distributed training using Horovod! For starters, I modified a relatively simple CNN code to allow distributed learning using Horovod, which is summarized in the following article.

[Read More]

Try Horovod in Docker

Motivation

I have been interested in Distributed Training for about a year. I have been experimenting with a distributed learning framework called Horovod on multiple TITAN-V-capable machines. I finally got a distributed training sample working, so I am posting it here.

[Read More]

Uninstall Rootless Docker

Introduction

In May of this year, I posted an article Building Rootless Docker, but for some reason I decided to uninstall rootless docker. The following is a summary of the uninstallation procedure. I will post the details of the circumstances that led to the uninstallation of rootless docker later.

[Read More]

Trying NVIDIA Modulus - Introduction to PINNs

Introduction

Over a month ago, I became interested in NVIDIA Modulus at the How to Speed Up Simulation with AI Surrogate Models? seminar I attended, I became interested in NVIDIA Modulus, so I bought the book and started studying it. As a prerequisite for future study of Modulus, I installed Modulus in my environment, so I summarized the installation process as “Introduction to PINNs”.

[Read More]

running rinna 3.6b on a docker container

Motivation

I wanted to try out a large-scale language model (LLM) for Japanese, so I used rinna, which was released in May. To save installation time, I ran rinna under a docker container environment.

I ran into some problems in doing so, which are summarized below.

[Read More]