In this post I'm going to write about benchmarking MPICH software using the framework we designed in Software Portability and Optimization class, Here is the link to get to the framework: https://github.com/pk400/SPO600-Build-Framework. MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard, it's being used exclusively on nine of the top 10 supercomputers.
I have spent a lot of time on getting this test framework run with my MPICH package and I faced many problems as well. First of all when I edited build plugin file to change CFLAGS to use compiler options selected in test framework and build my project the build fails and I didn't know why until the professor (Chris Tyler) looked at it and told me that exporting to CFLAGS variable is breaking the build. I looked into it and I wasn't sure why it was causing it to fail and I still don't know, I checked that the configure file and make file uses CFLAGS variable. I found the a way around by setting compiler options directly to the variable used for build but before that I unset the CFLAGS variable after calling configure command because configure command was adding compiler options to CFLAGS variable and in make command it uses CFLAGS variable to assign it into variable that has flags for building package which is MPICH_MAKE_CFLAGS. While I was figuring out this I found out that MPICH library was being build from CFLAGS and appended other compiler options in configure file as will. So by unsetting the variable CFLAGS after calling configure file I was able to build MPICH package and it's libraries with compiler flags chosen by the permutation, which I wanted.
For the tesing reason I used -02 and -03 as trial run for my package. The results on x86_64 system is:
Permutation columns: (uniqueId, givenId, flags, buildExit, buildWallTime, buildUserSystemTime, buildSize, testExit)
Benchmark columns: (uniqueId, permutationUniqueId, givenId, speedScore, memoryScore)
(1, 1, '-02', 0, 274.37, 703.28, 0, 0)
(1, 1, 1, 702, 17016)
(2, 1, 2, 702, 18736)
(3, 1, 3, 704, 18796)
(4, 1, 4, 701, 18796)
(2, 2, '-03', 0, 224.73, 598.86, 0, 0)
(5, 2, 1, 819, 18796)
(6, 2, 2, 825, 18776)
(7, 2, 3, 820, 18764)
(8, 2, 4, 820, 18784)
The results with test framework compiler options are:
There was an issue with first group because it has all the options that stay on in all O-levels and options that are kept off in all levels (-01, -02, -03) and because of that there was no delimiter causing the permutation to fail after first group. So to get results from all the groups I had to run the framework with group one only and remove group one from config file so other groups are used for building package in different set of compiler options. In the benchmarking results I'll only specify group name because there are a lot of flags being used, to see what compiler options are being used go to: https://github.com/pk400/SPO600-Build-Framework and look at monkeys10k.config file.
(1, 1, 'group1', 0, 256.45, 697.5, 0, 0)
(1, 1, 1, 702, 18760)
(2, 1, 2, 703, 18888)
(3, 1, 3, 701, 18796)
(4, 1, 4, 704, 18720)
(1, 1, 'group2 permute: 1', 0, 487.48, 729.73, 0, 0)
(1, 1, 1, 1373, 18804)
(2, 1, 2, 1424, 18800)
(3, 1, 3, 1405, 18780)
(4, 1, 4, 1361, 7356)
(2, 2, 'group2 permute: 2', 0, 470.55, 630.61, 0, 0)
(5, 2, 1, 1248, 14856)
(6, 2, 2, 1267, 16396)
(7, 2, 3, 1268, 18768)
(8, 2, 4, 1338, 17056)
(1, 1, 'group3 permute: 1', 0, 550.99, 734.2, 0, 0)
(1, 1, 1, 1555, 16392)
(2, 1, 2, 1660, 14876)
(3, 1, 3, 1659, 18700)
(4, 1, 4, 1653, 18764)
(2, 2, 'group3 permute: 2', 0, 435.94, 621.67, 0, 0)
(5, 2, 1, 703, 18804)
(6, 2, 2, 704, 18820)
(7, 2, 3, 699, 18836)
(8, 2, 4, 707, 18772)
(1, 1, 'group4 permute: 1', 0, 368.79, 714.64, 0, 0)
(1, 1, 1, 1292, 18776)
(2, 1, 2, 1303, 18744)
(3, 1, 3, 1266, 18816)
(4, 1, 4, 767, 18776)
(2, 2, 'group4 permute: 2', 0, 321.62, 616.49, 0, 0)
(5, 2, 1, 1409, 18800)
(6, 2, 2, 1328, 18768)
(7, 2, 3, 1337, 17272)
(8, 2, 4, 1302, 18812)
This measures are a combination of time it takes to configure, make, run make check, run benchmark on package. In the results above the group 1 contains flags which are always on/off in all levels made fastest test in wall time but group 4's off options were fastest in user systems time but we don't care about user system time because we want to measure the time user/person (stop watch) not computer's cpu cycles or etc. This just proves that the MPICH program compiler faster on defaults without any additional compiler flags because the once that are set by default is faster in this case on x86_64 machine but I would say to benchmark it more deep I would like to add compiler options of other groups (not one) to group one, one by one, and see performance on MPICH software.