How to run LINPACK Benchmark on LiCO

LINPACK is a software library for performing numerical linear algebra on digital computers, is the most popular benchmark for testing the floating-point performance of high-performance computer systems in the world.

HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark. The HPL software package requires the availibility on your system of an implementation of the Message Passing Interface MPI (1.1 compliant). An implementation of either the Basic Linear Algebra Subprograms BLAS or the Vector Signal Image Processing Library VSIPL is also needed.

Choose to install the math library according to the MPI type and the type of operation to be performed.

Math Library OpenMPI Mpich IntelMPI
CPU operation(hpl-2.3.tar.gz) BLAS BLAS IntelMKL
CUDA operation(hpl-2.0_FERMI_v15.tgz) IntelMKL IntelMKL -

Install Math Library



Start Installation

Before installing BLAS, you need to install the GCC/GFortran environment

Install BLAS

export LINPACK_PATH=~/linpack
tar xf blas-3.8.0.tgz
cd BLAS-3.8.0/
ar rv libblas.a *.o

Install CBLAS

tar xf cblas.tgz
sed -i '/^BLLIB/cBLLIB = ../lib/blas_LINUX.a'


Test Command
Test Results
 Complex CBLAS Test Program Results

 Test of subprogram number  1         CBLAS_CDOTC
                                    ----- PASS -----

 Test of subprogram number  2         CBLAS_CDOTU
                                    ----- PASS -----

 Test of subprogram number  3         CBLAS_CAXPY
                                    ----- PASS -----

 Test of subprogram number  4         CBLAS_CCOPY
                                    ----- PASS -----

 Test of subprogram number  5         CBLAS_CSWAP
                                    ----- PASS -----

 Test of subprogram number  6         CBLAS_SCNRM2
                                    ----- PASS -----

 Test of subprogram number  7         CBLAS_SCASUM
                                    ----- PASS -----

 Test of subprogram number  8         CBLAS_CSCAL
                                    ----- PASS -----

 Test of subprogram number  9         CBLAS_CSSCAL
                                    ----- PASS -----

 Test of subprogram number 10         CBLAS_ICAMAX
                                    ----- PASS -----



Refer to the OneAPI installation instructions in the LiCO installation document, and configure it to Module

Install HPL

export LINPACK_PATH=~/linpack
mkdir -p $LINPACK_PATH

HPL CPU Operation Program Installation

Start Installation

tar xf hpl-2.3.tar.gz
cd hpl-2.3

Select and modify the Make file according to the type of MPI you want to use

HPL CUDA Operation Program Installation

Start Installation

Installation requires CUDA environment

tar xf hpl-2.0_FERMI_v15.tar
cd ./hpl-2.0_FERMI_v15

Modify the Make.CUDA file according to the type of selected MPI


make arch=CUDA


cd ./bin/CUDA

HPL Run With MPI

Single Node

mpirun -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

Multi Node

OpenMPI/Intel MPI

Create and edit nodes file

c1 slots=2
c2 slots=2
mpirun -hostfile ./nodes -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

Create and edit nodes file

mpirun -f ./nodes -np 4 ./xhpl  # np should be greater than or equal to P*Q in HPL.dat

#### Example of running results

Calculate according to the configured HPL.dat file, and the final calculation performance is displayed in Gflops.

HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :    2900
NB     :       3
PMAP   : Row-major process mapping
P      :       1
Q      :       1
PFACT  :    Left
NBMIN  :       2        4
NDIV   :       2
RFACT  :    Left
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words


- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

T/V                N    NB     P     Q               Time                 Gflops
WR00L2L2        2900     3     1     1               3.66             4.4401e+00
HPL_pdgesv() start time Mon Oct 10 03:11:29 2022

HPL_pdgesv() end time   Mon Oct 10 03:11:32 2022

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   5.34873410e-03 ...... PASSED
T/V                N    NB     P     Q               Time                 Gflops
WR00L2L4        2900     3     1     1               3.20             5.0918e+00
HPL_pdgesv() start time Mon Oct 10 03:11:33 2022

HPL_pdgesv() end time   Mon Oct 10 03:11:36 2022

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   5.34873410e-03 ...... PASSED

Finished      2 tests with the following results:
              2 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.

End of Tests.