Re-measure OpenMPI performance using the HIMENO benchmark

Introduction

A month ago in this post, I measured the performance of OpenMPI with the HIMENO benchmark. My friend who saw that post pointed out some improvements regarding the order of the hostfile. In this post, I summarized the results of the performance measurement again after modifying the hostfile.

Changes

Since processes are allocated according to the order of hostfile description, it was pointed out to me that it would be better to order the nodes in order of performance, so I modified the hostfile description as follows.

# cat myhosts
saisei slots=4
jupiter slots=4
mokusei slots=8
ganymede slots=6
europe slots=4

Measurement Results

The number of boot processes and MFLOPS values without/with slots were as follows.

np MFLOPS(without slot) MFLOPS(with slot)
2 9,141 9,098
4 11,258 11,262
8 20,468 20,354
16 34,813 34,751
32 20,994 13,336
64 9,006 9,002

The results of the above measurements are graphed as follows

sjmge

The graph contrasting the previous measurement results (with slot specified) is shown below. Here, the last measurement result is ejgsm by connecting the first letter of the hostname (europe/jupiter/ganymede/saisei/mokusei) in the order of hostfile, and the current measurement result is sjmge in the same way.

ejgsm_sjmge

Trouble

When I tried to measure this time, the following message appeared.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:lrDSjoBAl2Eu4nm3LqaR/tdFVYuYh/v16Q+OwWN3Icg.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:3
  remove with:
  ssh-keygen -f "/root/.ssh/known_hosts" -R "[jupiter]:12345"
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
UpdateHostkeys is disabled because the host key is not trusted.
・・・

I searched the Internet, thinking that I would need to recreate the public and private keys using ssh-keygen again, and found that I could just delete the entry for the relevant host in known_hosts. I deleted the entire known_hosts file. That solved the problem.

Summary

The graph shows an improvement in peak performance (16 processes). However, since the difference was less than 20%, it would be a measurement error (variation), but since the difference was only a few percent except for np=16, I judged that there was an effect during the peak period.