LinPack BenchMark Usage on LiCO

How to run LINPACK Benchmark on LiCO

LINPACK is a software library for performing numerical linear algebra on digital computers, is the most popular benchmark for testing the floating-point performance of high-performance computer systems in the world.

HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark. The HPL software package requires the availibility on your system of an implementation of the Message Passing Interface MPI (1.1 compliant). An implementation of either the Basic Linear Algebra Subprograms BLAS or the Vector Signal Image Processing Library VSIPL is also needed.

Choose to install the math library according to the MPI type and the type of operation to be performed.

Math Library	OpenMPI	Mpich	IntelMPI
CPU operation(hpl-2.3.tar.gz)	BLAS	BLAS	IntelMKL
CUDA operation(hpl-2.0_FERMI_v15.tgz)	IntelMKL	IntelMKL	-

Install Math Library

BLAS

Install

Package Link

blas-3.8.0.tgz: http://www.netlib.org/blas/blas-3.8.0.tgz
cblas.tgz: http://www.netlib.org/blas/blast-forum/cblas.tgz

Start Installation

Before installing BLAS, you need to install the GCC/GFortran environment

Install BLAS

export LINPACK_PATH=~/linpack
mkdir -p $LINPACK_PATH && cd $LINPACK_PATH
wget http://www.netlib.org/blas/blas-3.8.0.tgz
tar xf blas-3.8.0.tgz
cd BLAS-3.8.0/
make
ar rv libblas.a *.o

Install CBLAS

cd $LINPACK_PATH
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xf cblas.tgz
cd CBLAS/
cp $LINPACK_PATH/BLAS-3.8.0/blas_LINUX.a $LINPACK_PATH/CBLAS/lib/
sed -i '/^BLLIB/cBLLIB = ../lib/blas_LINUX.a' Makefile.in
make

Test

Test Command

$LINPACK_PATH/CBLAS/testing/xccblat1

Test Results

 Complex CBLAS Test Program Results


 Test of subprogram number  1         CBLAS_CDOTC
                                    ----- PASS -----

 Test of subprogram number  2         CBLAS_CDOTU
                                    ----- PASS -----

 Test of subprogram number  3         CBLAS_CAXPY
                                    ----- PASS -----

 Test of subprogram number  4         CBLAS_CCOPY
                                    ----- PASS -----

 Test of subprogram number  5         CBLAS_CSWAP
                                    ----- PASS -----

 Test of subprogram number  6         CBLAS_SCNRM2
                                    ----- PASS -----

 Test of subprogram number  7         CBLAS_SCASUM
                                    ----- PASS -----

 Test of subprogram number  8         CBLAS_CSCAL
                                    ----- PASS -----

 Test of subprogram number  9         CBLAS_CSSCAL
                                    ----- PASS -----

 Test of subprogram number 10         CBLAS_ICAMAX
                                    ----- PASS -----

IntelMKL

Install

Refer to the OneAPI installation instructions in the LiCO installation document, and configure it to Module

Install HPL

OHPC 2.4 is installed by default in this scenario. It is recommended to use the Environment Modules system to manage shell environment.
Modify file paths according to the actual situation.

export LINPACK_PATH=~/linpack
mkdir -p $LINPACK_PATH

HPL CPU Operation Program Installation

Package Link

hpl-2.3.tar.gz: http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz

Start Installation

cd $LINPACK_PATH
wget http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
tar xf hpl-2.3.tar.gz
cd hpl-2.3

Select and modify the Make file according to the type of MPI you want to use

OpenMPI

Load OpenMPI

module load openmpi4/4.1.1  # Load according to environment version

Copy the required make files from setup

cp ./setup/Make.Linux_PII_CBLAS ./

Modify the Make file

Note：replace LINPACK_PATH with actual value

ARCH         = Linux_PII_CBLAS
TOPdir       = <LINPACK_PATH>/hpl-2.3
MPdir        = /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1  # MPI installation path, which can be modified according to the actual situation
MPlib        = $(MPdir)/lib/libmpi.so /usr/lib64/libpthread-2.28.so /usr/lib64/libc-2.28.so
LAdir        = <LINPACK_PATH>/CBLAS/lib
LAlib        = $(LAdir)/blas_LINUX.a $(LAdir)/cblas_LINUX.a
HPL_OPTS     =
CC           = mpicc
LINKER       = mpif77

Compile

make arch=Linux_PII_CBLAS

Test

cd ./bin/Linux_PII_CBLAS
./xhpl

Mpich

Load Mpich

module load mpich/3.3.2-ofi  # Load according to environment version

Copy the required make files from setup

cp ./setup/Make.Linux_PII_CBLAS ./

Modify the Make file

Note：replace LINPACK_PATH with actual value

ARCH         = Linux_PII_CBLAS
TOPdir       = <LINPACK_PATH>/hpl-2.3
MPdir        = /opt/ohpc/pub/mpi/mpich-ofi-gnu9-ohpc/3.3.2  # MPI installation path, which can be modified according to the actual situation
MPlib        = $(MPdir)/lib/libmpich.so /usr/lib64/libpthread-2.28.so /usr/lib64/libc-2.28.so
LAdir        = <LINPACK_PATH>/CBLAS/lib
LAlib        = $(LAdir)/blas_LINUX.a $(LAdir)/cblas_LINUX.a
HPL_OPTS     =
CC           = mpicc
LINKER       = mpif77

Compile

make arch=Linux_PII_CBLAS

Test

cd ./bin/Linux_PII_CBLAS
./xhpl

Intel MPI

Load Intel MPI and MKL

module load mpi
module load icc
module load mkl

Copy the required make files from setup

cp ./setup/Make.Linux_Intel64 ./

Modify the Make file

Note：replace LINPACK_PATH with actual value

ARCH         = Linux_Intel64
TOPdir       = <LINPACK_PATH>/hpl-2.3
MPdir        = /opt/intel/oneapi/mpi/latest  # MPI installation path, which can be modified according to the actual situation
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/release/libmpi.a
LAdir        = /opt/intel/oneapi/mkl/latest
LAinc        = $(LAdir)/include
LAlib        = -L$(LAdir)/lib/intel64 \
             -Wl,--start-group \
             $(LAdir)/lib/intel64/libmkl_intel_lp64.a \
             $(LAdir)/lib/intel64/libmkl_intel_thread.a \
             $(LAdir)/lib/intel64/libmkl_core.a \
             -Wl,--end-group -lpthread -ldl
CC           = mpiicc
OMP_DEFS = -qopenmp

Compile

make arch=Linux_Intel64

Test

cd ./bin/Linux_Intel64
./xhpl

HPL CUDA Operation Program Installation

Package Link

hpl-2.0_FERMI_v15.tgz(CUDA Accelerated Linpack): https://developer.nvidia.com/computeworks-developer-exclusive-downloads

Start Installation

Installation requires CUDA environment

cd $LINPACK_PATH
tar xf hpl-2.0_FERMI_v15.tar
cd ./hpl-2.0_FERMI_v15

Modify the Make.CUDA file according to the type of selected MPI

OpenMPI

Due to the version of the installation package provided by NVIDIA, OpenMPI needs to be recompiled to be compatible with the old version

Install OpenMPI

cd $LINPACK_PATH
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.4.tar.gz
tar xzvf openmpi-4.1.4.tar.gz
cd openmpi-4.1.4/
./configure --enable-mpi1-compatibility --with-cuda=/usr/local/cuda-11.7/include --prefix=/opt/ohpc/pub/mpi/openmpi-4.1.4
make
make install

Modify the Make.CUDA file

Note：replace LINPACK_PATH with actual value

TOPdir       = <LINPACK_PATH/hpl-2.0_FERMI_v15
MPdir        = /opt/ohpc/pub/mpi/openmpi-4.1.4  # MPI installation path, which can be modified according to the actual situation
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpi.so
LAdir        = /opt/intel/oneapi/mkl/latest/lib/intel64
CC      = /opt/ohpc/pub/mpi/openmpi-4.1.4/bin/mpicc

Mpich

Load Mpich

module load mpich/3.3.2-ofi  # Load according to environment version

Modify the Make.CUDA file

Note：replace LINPACK_PATH with actual value

TOPdir       = <LINPACK_PATH>/hpl-2.0_FERMI_v15
MPdir        = /opt/ohpc/pub/mpi/mpich-ofi-gnu9-ohpc/3.3.2  # MPI installation path, which can be modified according to the actual situation
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpi.so
LAdir        = /opt/intel/oneapi/mkl/latest/lib/intel64
CC      = mpicc

Compile

make arch=CUDA

Test

export LD_LIBRARY_PATH=$LINPACK_PATH/hpl-2.0_FERMI_v15/src/cuda:$LD_LIBRARY_PATH
cd ./bin/CUDA
./xhpl

HPL Run With MPI

The number of MPI processes should be greater than or equal to the relevant configuration in the HPL.dat file (Ps * Qs)

Single Node

mpirun -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

Multi Node

OpenMPI/Intel MPI

Create and edit nodes file

Make sure the total number of slots should be greater than or equal to the total number of processes

c1 slots=2
c2 slots=2

mpirun -hostfile ./nodes -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

Mpich

Create and edit nodes file

Make sure the total number of slots should be greater than or equal to the total number of processes

c1:2
c2:2

mpirun -f ./nodes -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

#### Example of running results

Calculate according to the configured HPL.dat file, and the final calculation performance is displayed in Gflops.

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    2900
NB     :       3
PMAP   : Row-major process mapping
P      :       1
Q      :       1
PFACT  :    Left
NBMIN  :       2        4
NDIV   :       2
RFACT  :    Left
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00L2L2        2900     3     1     1               3.66             4.4401e+00
HPL_pdgesv() start time Mon Oct 10 03:11:29 2022

HPL_pdgesv() end time   Mon Oct 10 03:11:32 2022

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   5.34873410e-03 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00L2L4        2900     3     1     1               3.20             5.0918e+00
HPL_pdgesv() start time Mon Oct 10 03:11:33 2022

HPL_pdgesv() end time   Mon Oct 10 03:11:36 2022

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   5.34873410e-03 ...... PASSED
================================================================================

Finished      2 tests with the following results:
              2 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================