LINPACK is a software library for performing numerical linear algebra on digital computers, is the most popular benchmark for testing the floating-point performance of high-performance computer systems in the world.
HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark. The HPL software package requires the availibility on your system of an implementation of the Message Passing Interface MPI (1.1 compliant). An implementation of either the Basic Linear Algebra Subprograms BLAS or the Vector Signal Image Processing Library VSIPL is also needed.
Choose to install the math library according to the MPI type and the type of operation to be performed.
Math Library | OpenMPI | Mpich | IntelMPI |
---|---|---|---|
CPU operation(hpl-2.3.tar.gz) | BLAS | BLAS | IntelMKL |
CUDA operation(hpl-2.0_FERMI_v15.tgz) | IntelMKL | IntelMKL | - |
Before installing BLAS, you need to install the GCC/GFortran environment
Install BLAS
export LINPACK_PATH=~/linpack
mkdir -p $LINPACK_PATH && cd $LINPACK_PATH
wget http://www.netlib.org/blas/blas-3.8.0.tgz
tar xf blas-3.8.0.tgz
cd BLAS-3.8.0/
make
ar rv libblas.a *.o
Install CBLAS
xxxxxxxxxx
cd $LINPACK_PATH
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xf cblas.tgz
cd CBLAS/
cp $LINPACK_PATH/BLAS-3.8.0/blas_LINUX.a $LINPACK_PATH/CBLAS/lib/
sed -i '/^BLLIB/cBLLIB = ../lib/blas_LINUX.a' Makefile.in
make
xxxxxxxxxx
$LINPACK_PATH/CBLAS/testing/xccblat1
x Complex CBLAS Test Program Results
Test of subprogram number 1 CBLAS_CDOTC
----- PASS -----
Test of subprogram number 2 CBLAS_CDOTU
----- PASS -----
Test of subprogram number 3 CBLAS_CAXPY
----- PASS -----
Test of subprogram number 4 CBLAS_CCOPY
----- PASS -----
Test of subprogram number 5 CBLAS_CSWAP
----- PASS -----
Test of subprogram number 6 CBLAS_SCNRM2
----- PASS -----
Test of subprogram number 7 CBLAS_SCASUM
----- PASS -----
Test of subprogram number 8 CBLAS_CSCAL
----- PASS -----
Test of subprogram number 9 CBLAS_CSSCAL
----- PASS -----
Test of subprogram number 10 CBLAS_ICAMAX
----- PASS -----
Refer to the OneAPI installation instructions in the LiCO installation document, and configure it to Module
xxxxxxxxxx
export LINPACK_PATH=~/linpack
mkdir -p $LINPACK_PATH
xxxxxxxxxx
cd $LINPACK_PATH
wget http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
tar xf hpl-2.3.tar.gz
cd hpl-2.3
Select and modify the Make file according to the type of MPI you want to use
OpenMPI
Load OpenMPI
xxxxxxxxxx
module load openmpi4/4.1.1 # Load according to environment version
Copy the required make files from setup
xxxxxxxxxx
cp ./setup/Make.Linux_PII_CBLAS ./
Modify the Make file
Note:replace LINPACK_PATH with actual value
xxxxxxxxxx
ARCH = Linux_PII_CBLAS
TOPdir = <LINPACK_PATH>/hpl-2.3
MPdir = /opt/ohpc/pub/mpi/openmpi4-gnu9/4.1.1 # MPI installation path, which can be modified according to the actual situation
MPlib = $(MPdir)/lib/libmpi.so /usr/lib64/libpthread-2.28.so /usr/lib64/libc-2.28.so
LAdir = <LINPACK_PATH>/CBLAS/lib
LAlib = $(LAdir)/blas_LINUX.a $(LAdir)/cblas_LINUX.a
HPL_OPTS =
CC = mpicc
LINKER = mpif77
Compile
xxxxxxxxxx
make arch=Linux_PII_CBLAS
Test
xxxxxxxxxx
cd ./bin/Linux_PII_CBLAS
./xhpl
Mpich
Load Mpich
xxxxxxxxxx
module load mpich/3.3.2-ofi # Load according to environment version
Copy the required make files from setup
xxxxxxxxxx
cp ./setup/Make.Linux_PII_CBLAS ./
Modify the Make file
Note:replace LINPACK_PATH with actual value
xxxxxxxxxx
ARCH = Linux_PII_CBLAS
TOPdir = <LINPACK_PATH>/hpl-2.3
MPdir = /opt/ohpc/pub/mpi/mpich-ofi-gnu9-ohpc/3.3.2 # MPI installation path, which can be modified according to the actual situation
MPlib = $(MPdir)/lib/libmpich.so /usr/lib64/libpthread-2.28.so /usr/lib64/libc-2.28.so
LAdir = <LINPACK_PATH>/CBLAS/lib
LAlib = $(LAdir)/blas_LINUX.a $(LAdir)/cblas_LINUX.a
HPL_OPTS =
CC = mpicc
LINKER = mpif77
Compile
xxxxxxxxxx
make arch=Linux_PII_CBLAS
Test
xxxxxxxxxx
cd ./bin/Linux_PII_CBLAS
./xhpl
Intel MPI
Load Intel MPI and MKL
xxxxxxxxxx
module load mpi
module load icc
module load mkl
Copy the required make files from setup
xxxxxxxxxx
cp ./setup/Make.Linux_Intel64 ./
Modify the Make file
Note:replace LINPACK_PATH with actual value
xxxxxxxxxx
ARCH = Linux_Intel64
TOPdir = <LINPACK_PATH>/hpl-2.3
MPdir = /opt/intel/oneapi/mpi/latest # MPI installation path, which can be modified according to the actual situation
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/release/libmpi.a
LAdir = /opt/intel/oneapi/mkl/latest
LAinc = $(LAdir)/include
LAlib = -L$(LAdir)/lib/intel64 \
-Wl,--start-group \
$(LAdir)/lib/intel64/libmkl_intel_lp64.a \
$(LAdir)/lib/intel64/libmkl_intel_thread.a \
$(LAdir)/lib/intel64/libmkl_core.a \
-Wl,--end-group -lpthread -ldl
CC = mpiicc
OMP_DEFS = -qopenmp
Compile
xxxxxxxxxx
make arch=Linux_Intel64
Test
xxxxxxxxxx
cd ./bin/Linux_Intel64
./xhpl
Installation requires CUDA environment
xxxxxxxxxx
cd $LINPACK_PATH
tar xf hpl-2.0_FERMI_v15.tar
cd ./hpl-2.0_FERMI_v15
Modify the Make.CUDA file according to the type of selected MPI
OpenMPI
Due to the version of the installation package provided by NVIDIA, OpenMPI needs to be recompiled to be compatible with the old version
Install OpenMPI
xxxxxxxxxx
cd $LINPACK_PATH
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.4.tar.gz
tar xzvf openmpi-4.1.4.tar.gz
cd openmpi-4.1.4/
./configure --enable-mpi1-compatibility --with-cuda=/usr/local/cuda-11.7/include --prefix=/opt/ohpc/pub/mpi/openmpi-4.1.4
make
make install
Modify the Make.CUDA file
Note:replace LINPACK_PATH with actual value
xxxxxxxxxx
TOPdir = <LINPACK_PATH/hpl-2.0_FERMI_v15
MPdir = /opt/ohpc/pub/mpi/openmpi-4.1.4 # MPI installation path, which can be modified according to the actual situation
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.so
LAdir = /opt/intel/oneapi/mkl/latest/lib/intel64
CC = /opt/ohpc/pub/mpi/openmpi-4.1.4/bin/mpicc
Mpich
Load Mpich
xxxxxxxxxx
module load mpich/3.3.2-ofi # Load according to environment version
Modify the Make.CUDA file
Note:replace LINPACK_PATH with actual value
xxxxxxxxxx
TOPdir = <LINPACK_PATH>/hpl-2.0_FERMI_v15
MPdir = /opt/ohpc/pub/mpi/mpich-ofi-gnu9-ohpc/3.3.2 # MPI installation path, which can be modified according to the actual situation
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.so
LAdir = /opt/intel/oneapi/mkl/latest/lib/intel64
CC = mpicc
Compile
xxxxxxxxxx
make arch=CUDA
Test
xxxxxxxxxx
export LD_LIBRARY_PATH=$LINPACK_PATH/hpl-2.0_FERMI_v15/src/cuda:$LD_LIBRARY_PATH
cd ./bin/CUDA
./xhpl
xxxxxxxxxx
mpirun -np 4 ./xhpl # np should be greater than or equal to P*Q in HPL.dat
Create and edit nodes file
xxxxxxxxxx
c1 slots=2
c2 slots=2
xxxxxxxxxx
mpirun -hostfile ./nodes -np 4 ./xhpl # np should be greater than or equal to P*Q in HPL.dat
Create and edit nodes file
xxxxxxxxxx
c1:2
c2:2
xxxxxxxxxx
mpirun -f ./nodes -np 4 ./xhpl # np should be greater than or equal to P*Q in HPL.dat
Calculate according to the configured HPL.dat file, and the final calculation performance is displayed in Gflops.
xxxxxxxxxx
================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 2900
NB : 3
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Left
NBMIN : 2 4
NDIV : 2
RFACT : Left
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2L2 2900 3 1 1 3.66 4.4401e+00
HPL_pdgesv() start time Mon Oct 10 03:11:29 2022
HPL_pdgesv() end time Mon Oct 10 03:11:32 2022
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 5.34873410e-03 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR00L2L4 2900 3 1 1 3.20 5.0918e+00
HPL_pdgesv() start time Mon Oct 10 03:11:33 2022
HPL_pdgesv() end time Mon Oct 10 03:11:36 2022
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 5.34873410e-03 ...... PASSED
================================================================================
Finished 2 tests with the following results:
2 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================