easuring OpenMPI performance using the HIMENO benchmark

Motivation

As I stated in this post yesterday, I was able to run a program using OpenMPI on a Docker container running on multiple nodes. I wanted to find out how much performance I could improve by using OpenMPI, so I decided to benchmark it. Actually, I had some difficulties this time as well, and I would be happy if that part is helpful for others.

[Read More]

Run Docker containers implementing OpenMPI on multiple nodes

Motivation

As previously mentioned in this post, I am moving forward with the goal of running Athena++ on multiple nodes. As a preliminary step, I have attempted to run a Docker container with OpenMPI configured on multiple nodes. I will post a summary of what I have done, as I had some difficulties and it may be helpful to others.

[Read More]

Run the Athena++ tutorial

Introduction

In a previous post, I summarized the contents of the first tutorial “1D Hydorodynamics and MHD” after installing Athena++. This post is a continuation of that post and summarizes the contents of the tutorial that was executed to perform visualization and other tasks.

[Read More]

Try Athena++, a magnetohydrodynamic simulation code for astrophysics

Introduction

I have been interested in trying out astrophysics-related simulations for some time, and had been checking out ENZO, GADGET, GIZMO, and others. I happened to know Athena, and when I looked into it, I found that Associate Professor Kengo Tomita of Tohoku University maintains a Japanese page and has some information in Japanese, so I decided to give it a try.

Here, I summarize the process of installing and running the tutorial, and post it.

[Read More]

Run Japanese LLMs on an on-premise environment

Motivation

Looking back on the year 2023, it was a year in which many Japanese LLMs (Large Language Models) were released. I have tried to run some of them in my home environment, so I will summarize them here.

[Read More]

Modify CNN training code to work with Horovod

Introduction

With Try Horovod in Docker, you can now use Horovod in your own environment (on-premises) and in a Docker environment. The next thing to do is to modify the training code running on a single server to apply it to distributed training using Horovod! For starters, I modified a relatively simple CNN code to allow distributed learning using Horovod, which is summarized in the following article.

[Read More]

Try Horovod in Docker

Motivation

I have been interested in Distributed Training for about a year. I have been experimenting with a distributed learning framework called Horovod on multiple TITAN-V-capable machines. I finally got a distributed training sample working, so I am posting it here.

[Read More]